I have a question about using index in Oracle.
Suppose table has B tree index created on column columnA, which has high cardinality or unique.
This table has 1KK rows.
I make a query example where columnA like 'A%'
Suppose in this example this query will return me 90 percent of data.
So some questions:
Will oracle determine to use index or not?
If oracle will determine itself, what is the percentage value (example 50%) when oracle will use index. Example 30% use index, 40% will not use? And when I would benefit from using indexes.
Can I force oracle to use index or not depending on query myself.
Thanks in advance.
Related
If for example I have composite non-clustered index as following:
CREATE NONCLUSTERED INDEX idx_Test ON dbo.Persons(IsActive, UserName)
Depending on this answer How important is the order of columns in indexes?
If I run this query :
Select * From Persons Where UserName='Smith'
In the query above IsActive which its order=1 in the non-clustered index is not present. Does that mean Sql Server query optimizer will ignore looking up in the index because IsActive is not present or what?
Of course I can just test it and check the execution plan, and I will do that, but I'm also curious about the theory behind it. When does cardinality matter and when does it not?
SQLServer will scan the total index ,in this case it might be narrowest index..
Below is a small example on orders table i have
Query predicate (shipperid='G') satisfies 199748 rows,but sql server has to read total rows (998123) to get data.This is visible from the number of rows read to actual number of rows.
I found this from Craig freedman to be very usefull..Assuming you have index on (a,b)..SQLServer can effectively do below
a=somevalue and b=somevalue
a=someval and b>0
a=someval and b>=0
for below operations,sql server will choose to filter out as many as rows possible by first predicate(This is also the reason you might have heard to keep a column with more unique values first) and will use second predicate as a residual
- a>=somevalue and b=someval
for below case,sql server has to scan the entire index..
b=someval
Further reading :
Craig Freedman's SQL Server Blog :Seek Predicates
Probe Residual when you have a Hash Match – a hidden cost in execution plans:Rob Farley
The Tipping Point Query Answers:Kimberly L. Tripp
After I created the indexed view, I tried disabling all the indexes in base tables including the indexes for foreign key column (constraint is still there) and the query plan for the view stays the same.
It is just like magic to me that the indexed view would be able to optimize the query so much even without base table being indexed. Even without any index on the View, SQL Server is able to do an index scan on the primary key index of the indexed view to retrieve data like 1000 times faster than using the base table.
Something like SELECT * FROM MyView WITH(NOEXPAND) WHERE NotIndexedColumn = 5 ORDER BY NotIndexedColumn
So the first two questions are:
Is there any benefit to index base tables of indexed view?
What is Sql server doing when it is doing a index scan on the PK while the constraint is on a not indexed column?
Then I noticed that if I use full-text search + order by I would see a table spool (eager spool) in the query plan with a cost like 95%.
Query looks like SELECT ID FROM View WITH(NOEXPAND) WHERE CONTAINS(IndexedColumn, '"SomeText*"') ORDER BY IndexedColumn
Question n° 3:
Is there any index I could add to get rid of that operation?
It's important to understand that an indexed view is a "materialized view" and the results are stored onto disk.
So the speedup you are seeing is the actual result of the query you are seeing stored to disk.
To answer your questions:
1) Is there any benefit to index base tables of indexed view?
This is situational. If your view is flattening out data or having many extra aggregate columns, then an indexed view is better than the table. If you are just using your indexed view like such
SELECT * FROM foo WHERE createdDate > getDate() then probably not.
But if you are doing SELECT sum(price),min(id) FROM x GROUP BY id,price then the indexed view would probably be better. Granted, you are doing a more complex query with joins and other advanced options.
2) What is Sql server doing when it is doing a index scan on the PK while the constraint is on a not indexed column?
First we need to understand how clustered indexes are stored. The index is stored in a B-tree. So SQL Server is walking the tree finding all values that match your criteria when you are searching on a clustered index Depending on how you have your indexes set up i.e covering vs non covering and how your non-clustered indexes are set up will determine what the Pages and Extents look like. Without more knowledge of the table structure I can't help you understand what the scan is actually doing.
3)Is there any index I could add to get rid of that operation?
Just because something is taking 95% of the query's time doesn't make that a bad thing. The query time needs to add up to 100%, so no matter what you do there is always going to be something taking up a large percentage of time. What you need to check is the IO reads and how much time the query itself takes.
To determine this, you need to understand that SQL Server caches the results of queries. With this in mind, you can have a query take a long time the first time but afterward since the data itself is cached it would be much quicker. It all depends on the frequency of the query and how your system is set up.
For a more in-depth read on indexed view
Good Day,
I would like to check what the best way is to partition a Postgres table on a columns prefix. I have a large table (+-300 750 Million rows x 10 columns) and I would like to partition it on a prefix of column 1.
Data looks like:
ABCDEF1xxxxxxxx
ABCDEF1xxxxxxxy
ABCDEF1xxxxxxxz
ABCDEF2xxxxxxxx
ABCDEF2xxxxxxxy
ABCDEF2xxxxxxxz
ABCDEF3xxxxxxxx
ABCDEF3xxxxxxxz
ABCDEF4xxxxxxxx
ABCDEF4xxxxxxxy
Their will only ever by 10 partitions i.e. ABCDEF0...->ABCDEF9...
What I've currently done is make tables like:
CREATE TABLE public.mydata_ABCDEF1 (
CHECK ( col1 like 'ABCDEF1%' )
) INHERITS (public.mydata);
CREATE TABLE public.mydata_ABCDEF2 (
CHECK ( col1 like 'ABCDEF2%' )
) INHERITS (public.mydata);
etc. Then the trigger with similar logic:
IF ( NEW.col1 like 'ABCDEF1%' ) THEN
INSERT INTO public.mydata_ABCDEF1 VALUES (NEW.*);
ELSIF ( NEW.imsi like 'ABCDEF2%' ) THEN
INSERT INTO public.simdata_ABCDEF2 VALUES (NEW.*);
I'm concerned if partitioning in this way will speed up query time? or if I should consider partitioning on substr (not sure how), or if I should make a new column with the prefix and partition on that column?
Any advise is appreciated.
I know this is an old question, but I am adding this answer in case anyone else needs a solution.
Postgres 10 allows range partitioning https://www.postgresql.org/docs/10/static/ddl-partitioning.html.
While the examples in the docs use date ranges, you can also use string ranges since Postgres (mostly) uses ASCII ordering. The below code creates a parent table and then two child tables, which depending on your specific codes, should automatically bin any alphanumeric based on the prefixes provided. The ranges do have to be non-overlapping, which is why I simply cannot create a range from ABCDEF1 to ABCDEF2.
CREATE TABLE mydata (...) PARTITION BY RANGE (col1);
CREATE TABLE mydata_abcdef1 PARTITION OF mydata
FOR VALUES FROM ('ACBCDEF1') to ('ABCDEF1z');
CREATE TABLE mydata_abcdef1 PARTITION OF mydata
FOR VALUES FROM ('ACBCDEF2') to ('ABCDEF2z');
It will significantly speed-up your queries when each one of the partitioned tables have their indexes partitioned as appropriately, e.g.:
CREATE INDEX ON public.mydata_ABCDEF1 (...) WHERE col1 like 'ABCDEF1%';
The short answer is "probably not," but it really depends on exactly what your queries are.
The question is really- what are you trying to accomplish with the partitioning? Generally speaking, PostgreSQL's btree index is very fast and efficient at finding the specific records you are asking for- faster than PostgreSQL is at figuring out which table out of a set of partitioned tables you have data stored in.
Where partitioning is extremely useful is when it helps with data management. The reason it is useful there is that you can often partition based on time and then, when the data has aged long enough, simply remove the older partitioning instead of having to issue "DELETE" queries that mark records as deleted, which then have to be VACUUM'd to have the space reclaimed, and ends up causing bloat in the table and indexes.
300M records is about the point where I might consider partitioning, but I wouldn't jump to partitioning the data at that point without a clear reason why having the data partitioned will be helpful.
Also, be aware that PostgreSQL's query planner does not handle very large numbers of partitions very well; hundreds and thousands of partitions will slow down planning time. That's not very obvious with pre-9.5 versions, but in 9.5 an "EXPLAIN ANALYZE" will return the planning time required for a given query:
=*> explain analyze select * from downloads;
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Seq Scan on downloads (cost=0.00..38591.76 rows=999976 width=193) (actual time=23.863..2088.732 rows=
Planning time: 0.219 ms
Execution time: 2552.878 ms
(3 rows)
Is it a good idea to index varchar columns only used in LIKE opertations? From what I can read from query analytics I get from the following query:
SELECT * FROM ClientUsers WHERE Email LIKE '%niels#bosmainter%'
I get an "Estimated subtree cost" of 0.38 without any index and 0.14 with an index. Is this a good metric to use for anlayzing if a query has been optimized with an index?
Given the data 'abcdefg'
WHERE Column1 LIKE '%cde%' --can't use an index
WHERE Column1 LIKE 'abc%' --can use an index
WHERE Column1 Like '%defg' --can't use an index, but see note below
Note: If you have important queries that require '%defg', you could use a persistent computed column where you REVERSE() the column and then index it. Your can then query on:
WHERE Column1Reverse Like REVERSE('defg')+'%' --can use the persistent computed column's index
In my experience the first %-sign will make any index useless, but one at the end will use the index.
To answer the metrics part of your question: The type of index/table scan/seek being performed is a good indicator for knowing if an index is being (properly) used. It's usually shown topmost in the query plan analyzer.
The following scan/seek types are sorted from worst (top) to best (bottom):
Table Scan
Clustered Index Scan
Index Scan
Clustered Index Seek
Index Seek
As a rule of thumb, you would normally try to get seeks over scans whenever possible. As always, there are exceptions depending on table size, queried columns, etc. I recommend doing a search on StackOverflow for "scan seek index", and you'll get a lot of good information about this subject.
I am working on optimizing a SQL query that goes against a very wide table in a legacy system. I am not able to narrow the table at this point for various reasons.
My query is running slowly because it does an Index Seek on an Index I've created, and then uses a Bookmark Lookup to find the additional columns it needs that do not exist in the Index. The bookmark lookup takes 42% of the query time (according to the query optimizer).
The table has 38 columns, some of which are nvarchars, so I cannot make a covering index that includes all the columns. I have tried to take advantage of index intersection by creating indexes that cover all the columns, however those "covering" indexes are not picked up by the execution plan and are not used.
Also, since 28 of the 38 columns are pulled out via this query, I'd have 28/38 of the columns in the table stored in these covering indexes, so I'm not sure how much this would help.
Do you think a Bookmark Lookup is as good as it is going to get, or what would another option be?
(I should specify that this is SQL Server 2000)
OH,
the covering index with include should work. Another option might be to create a clustered indexed view containing only the columns you need.
Regards,
Lieven
You could create an index with included columns as another option
example from BOL, this is for 2005 and up
CREATE NONCLUSTERED INDEX IX_Address_PostalCode
ON Person.Address (PostalCode)
INCLUDE (AddressLine1, AddressLine2, City, StateProvinceID);
To answer this part "I have tried to take advantage of index intersection by creating indexes that cover all the columns, however those "covering" indexes are not picked up by the execution plan and are not used."
An index can only be used when the query is created in a way that it is sargable, in other words if you use function on the left side of the operator or leave out the first column of the index in your WHERE clause then the index won't be used. If the selectivity of the index is low then also the index won't be used
Check out SQL Server covering indexes for some more info