Why oracle table indexed but still do full table scan? - database

I have a table 'MSATTRIBUTE' with 3000K rows. I used the following query to retrieve data, this query has different execution plan with same DB data but in different env. in one env, it appears full scan so the query is very slow, but in another env it all used index scan it's quite good, everybody who knows why it have full table scan in one env because I built index for them, how do I let become index scan just like what I tested in env 1. how I can improve this query?

without understanding way more than I care to know about your data model and you business it's hard to give concrete positive advice. But here are some notes about your indexing strategy and why I would guess the optimizer is not using the indxes you have.
In the sub-query the access path to REDLINE_MSATTRIBUTE drives from three columns:
CLASS
OBJECT_ID
CHANGE_RELEASE_DATE.
CLASS is not indexed. but that is presumably not very selective. OBJECT_ID
is the leading column of a compound index but the other columns are irrelevant the sub-query.
But the biggest problem is CHANGE_RELEASE_DATE. This is not indexed at all. Which is bad news, as your one primary key look up produces a date which is then compared with CHANGE_RELEASE_DATE. If a column is not indexed teh database has to read the table to get its values.
The main query drives off
ATTID
CHANGE_ID
OBJECT_ID (again)
CHANGE_RELEASE_DATE (again)
CLASS (again)
OLD_VALUE
ATTID is indexed but how sleective is that index? The optimizer probably doesn't think it's very selective. ATTID is also in a compound index with CHANGE_ID and OLD_VALUE but none of them are the leading columns, so that's not very useful. And we've discussed CLASS, CHANGE_RELEASE_DATE and OBJECT_ID already.
The optimizer will only choose to use an index if it is cheaper (fewer reads) than a table scan. This usually means WHERE clause criteria need to map to the leading (i.e. leftmost) columns of an index. This could be the case with OBJECT_ID and ATTID in the sub-query except that
The execution plan would have to do an INDEX SKIP SCAN because REDLINE_MSATTRIBUTE_INDEX1 has CHANGE_ID between the two columns
The database has to go to the table anyway to get the CLASS and the CHANGE_RELEASE_DATE.
So, you might get some improvement by building an index on (CHANGE_RELEASE_DATE, CLASS, OBJECT_ID, ATTID). But as I said upfront, without knowing more about your situation these are just ill-informed guesses.

If the rows are in a different order in the two tables then the indexes in the two systems can have different clustering factors, and hence difference estimated costs for index access. Check the table and index statistics, including the clustering factor, to see if there are significant differences.
Also, do either of the systems' explain plans mention dynamic sampling?

when oracle has an index and decides to use/not use it it's might be because
1) you may have different setting for OPTIMIZER_MODE - make sure it's not on RBO.
2) the data is different - in this case oracle might evaluate the query stats differently.
3) the data is the same but the stats are not up to date. in this case - gather stats
dbms_stats.gather_table_stats('your3000Ktable',cascade=>true);
4) there are allot more reasons why oracle will not use the index on one environment, i'll suggest comparing parameters (such as OPTIMIZER_ INDEX_COST_ADJ etc...)

One immediate issue is this piece SELECT RELEASE_DATE FROM CHANGE WHERE ID = 136972355 (This piece of code will run for every row coming back and it doesn't need to... a better way of doing this is using a single cartesian table so it only runs once and returns a static value to compare....
Example 1:
Select * From Table1, (Select Sysdate As Compare From Dual) Table2 Where Table1.Date > Table2.Compare.
Is always faster than
Select * from Table1 Where Date > Sysdate -- Sysdate will get called for each row as it is dynamic function based value. The earlier example will resolve once to a literal and drastically faster. and i believe this is definitely once piece hurting your query and forcing a table scan.
I also believe this is a more efficient way to execute the query.
Select
REDLINE_MSATTRIBUTE.ATTID
,REDLINE_MSATTRIBUTE.VALUE
From
REDLINE_MSATTRIBUTE
,(
SELECT ATTID
,CHANGE_ID
,MIN(CHANGE_RELEASE_DATE) RELEASE_DATE
FROM REDLINE_MSATTRIBUTE
,(SELECT RELEASE_DATE FROM CHANGE WHERE ID = 136972355) T_COMPARE
WHERE CLASS = 9000
And OBJECT_ID = 32718015
And CHANGE_RELEASE_DATE > T_COMPARE.RELEASE_DATE
And ATTID IN (1564, 1565)
GROUP
BY ATTID,
CHANGE_ID
) T_DYNAMIC
Where
REDLINE_MSATTRIBUTE.ATTID = T_DYNAMIC.ATTID
And REDLINE_MSATTRIBUTE.CHANGE_ID = T_DYNAMIC.CHANGE_ID
And REDLINE_MSATTRIBUTE.RELEASE_DATE = T_DYNAMIC.RELEASE_DATE
And CLASS = 9000
And OBJECT_ID = 32718015
And OLD_VALUE ='Y'
Order
By REDLINE_MSATTRIBUTE.ATTID,
REDLINE_MSATTRIBUTE.VALUE;

Related

Why does snowflake create table as select (CTAS) ignore order by clause?

The command is:
drop table if exists metrics_done;
create table metrics_done as select * from metrics where end_morning='2022-03-31' order by LOG_INFO desc;
The expected behaviour is creation of a table with sorted entries. But this does not happen. Why?
Snowflake does use ORDER BY on a CTAS. You can see that by using the system$clustering_information - subject to some limitations on high cardinality and how the function checks clustering state before it runs the auto clustering service with a new key at least once.
However, just because Snowflake uses the ORDER BY in a CTAS, it doesn't mean the rows will return in order without using an ORDER BY clause. Snowflake is an MPP system and will scan multiple micropartitions during a query. Without specifying an ORDER BY, there is no reason the optimizer should generate a plan that guarantees order. The plan it generates can and will return rows in the order they're ready for the result.
Here's an over-simplistic example: on a CTAS you order by date and all rows in micropartition 1 have date 2022-01-01; all rows in micropartition have date 2022-01-02. When you select rows from that table, the scan for micropartition 2 is just as likely to finish first as micropartition 1 is. If #2 finishes first, those rows will be first in the result set.
Also, when the table becomes large and it has more micropartitions assigned to scan than there are available CPUs in the warehouse, one or more CPUs will need to scan multiple micropartitions. In this case, there's no reason to prefer to scan one micropartition before another.

Simple select query taking too long

I have a table with around 18k rows for a certain week and another week has 22k rows.
I'm using view and indexes to retrieve the data like so
SELECT TOP 100 * FROM my_view
WHERE timestamp BETWEEN #date1 AND
#date2
But somehow, the week with 22k retrieves data faster (around 3-5sec) while the other takes a minute at least. These causes my wcf to timeout. What am i missing?
Apply an index on timestamp field.
if you have already an index on timestamp then check the index being used for this query in execution plan.
The index hint will only come into play where your query involves joining tables, and where the columns being used to join to the other table matches more than one index. In that case the database engine may choose to use one index to make the join, and from investigation you may know that if it uses another index the query will perform better. In that case you provide the index hint telling the database engine which index to use.
Sample code use index hints:
select [Order].[OrgId], [OrderDetail].[ProductId]
from [Order]
inner join [OrderDetail] **with(index(IX_OrderDetail_OrderId))** on [Order].[OrderId] = [OrderDetail].[OrderId]

Is database index useful if query returns all rows?

If I used an Index column in a query which returns all the Rows, is it advantageous to use the Index column in the where clause ?
For eg. select * from table1 where Salary > 1;
If the salary of all the employees is greater than 1, is it advantageous to use the Index column in the where clause ?
Are Indexed a big overhead while inserting if the Database is most likely to be used as above ?
Indexes are useless when You perform full scans without ORDER BY . Oracle index organized table,Postgresl cluster table, MySQL (InnoDB) PRIMARY KEY clustered index: give big performance for ORDER BY.
http://dev.mysql.com/doc/refman/5.1/en/innodb-index-types.html
"Are Indexed a big overhead while inserting if the Database is most likely to be used as above ?"
If index fit into RAM everything is ok.
If your query returns all the rows, all the columns and has no ordering an index would not help this query at all. There would be no point putting bogus predicates in that always evaluate to true.
The form of the question seems a bit confusing. I don't think you are really asking whether it is advantageous to use a certain WHERE clause. Surely if you want to return rows where Salary > 1 then it's advantageous to specify that - otherwise you might not get back what you expect (the data might have changed)!
I assume what you really mean to ask is whether this query could perform better with a index than without it. I don't think we necessarily have enough information to answer that. We don't know what DBMS you are using or how the tables are stored or what type of index might be used. For example if the query is a query on a view and the view is indexed or the table(s) underlying it are indexed then indexes could make a big difference to the performance.
However, it's usually poor practice to specify SELECT * in a query. Specifying SELECT * means you will get all columns, even if the set of columns has changed. Better to specify only those columns you need because that way you stand a better chance of getting the most efficient execution plan.
I wonder if #iddqd's answer is correct.
The criterion, "WHERE Salary > `" would require an index wouldn't it?
How does MySQL know that all salaries are greater than 1 unless it a.) has an index or b.) does a full table scan?
If you know for a fact that you will need to perform a full table scan, then you could use:
IGNORE INDEX ([*index_list*])
Note MySQL does not pay attention to the above directive for ORDER BY and GROUP BY before version 5.1.17.

what is the fastest way of getting table record count with condition on SQL Server

As per subject, i am looking for a fast way to count records in a table without table scan with where condition
There are different methods, the most reliable one is
Select count(*) from table_name
But other than that you can also use one of the followings
select sum(1) from table_name
select count(1) from table_name
select rows from sysindexes where object_name(id)='table_name' and indid<2
exec sp_spaceused 'table_name'
DBCC CHECKTABLE('table_name')
The last 2 need sysindexes to be updated, run the following to achieve this, if you don't update them is highly likely it'll give you wrong results, but for an approximation they might actually work.
DBCC UPDATEUSAGE ('database_name','table_name') WITH COUNT_ROWS.
EDIT: sorry i did not read the part about counting by a certain clause. I agree with Cruachan, the solution for your problem are proper indexes.
The following page list 4 methods of getting the number of rows in a table with commentary on accuracy and speed.
http://blogs.msdn.com/b/martijnh/archive/2010/07/15/sql-server-how-to-quickly-retrieve-accurate-row-count-for-table.aspx
This is the one Management Studio uses:
SELECT CAST(p.rows AS float)
FROM sys.tables AS tbl
INNER JOIN sys.indexes AS idx ON idx.object_id = tbl.object_id and idx.index_id < 2
INNER JOIN sys.partitions AS p ON p.object_id=CAST(tbl.object_id AS int)
AND p.index_id=idx.index_id
WHERE ((tbl.name=N'Transactions'
AND SCHEMA_NAME(tbl.schema_id)='dbo'))
Simply, ensure that your table is correctly indexed for the where condition.
If you're concerned over this sort of performance the approach is to create indexes which incorporate the field in question, for example if your table contains a primary key of foo, then fields bar, parrot and shrubbery and you know that you're going to need to pull back records regularly using a condition based on shrubbery that just needs data from this field you should set up a compound index of [shrubbery, foo]. This way the rdbms only has to query the index and not the table. Indexes, being tree structures, are far faster to query against than the table itself.
How much actual activity the rdbms needs depends on the rdbms itself and precisely what information it puts into the index. For example, a select count()* on an unindexed table not using a where condition will on most rdbms's return instantly as the record count is held at the table level and a table scan is not required. Analogous considerations may hold for index access.
Be aware that indexes do carry a maintenance overhead in that if you update a field the rdbms has to update all indexes containing that field too. This may or may not be a critical consideration, but it's not uncommon to see tables where most activity is read and insert/update/delete activity is of lesser importance which are heavily indexed on various combinations of table fields such that most queries will just use the indexes and not touch the actual table data itself.
ADDED: If you are using indexed access on a table that does have significant IUD activity then just make sure you are scheduling regular maintenance. Tree structures, i.e. indexes, are most efficient when balanced and with significant UID activity periodic maintenance is needed to keep them this way.

How to Speed Up Simple Join

I am no good at SQL.
I am looking for a way to speed up a simple join like this:
SELECT
E.expressionID,
A.attributeName,
A.attributeValue
FROM
attributes A
JOIN
expressions E
ON
E.attributeId = A.attributeId
I am doing this dozens of thousands times and it's taking more and more as the table gets bigger.
I am thinking indexes - If I was to speed up selects on the single tables I'd probably put nonclustered indexes on expressionID for the expressions table and another on (attributeName, attributeValue) for the attributes table - but I don't know how this could apply to the join.
EDIT: I already have a clustered index on expressionId (PK), attributeId (PK, FK) on the expressions table and another clustered index on attributeId (PK) on the attributes table
I've seen this question but I am asking for something more general and probably far simpler.
Any help appreciated!
You definitely want to have indexes on attributeID on both the attributes and expressions table. If you don't currently have those indexes in place, I think you'll see a big speedup.
In fact, because there are so few columns being returned, I would consider a covered index for this query
i.e. an index that includes all the fields in the query.
Some things you need to care about are indexes, the query plan and statistics.
Put indexes on attributeId. Or, make sure indexes exist where attributeId is the first column in the key (SQL Server can still use indexes if it's not the 1st column, but it's not as fast).
Highlight the query in Query Analyzer and hit ^L to see the plan. You can see how tables are joined together. Almost always, using indexes is better than not (there are fringe cases where if a table is small enough, indexes can slow you down -- but for now, just be aware that 99% of the time indexes are good).
Pay attention to the order in which tables are joined. SQL Server maintains statistics on table sizes and will determine which one is better to join first. Do some investigation on internal SQL Server procedures to update statistics -- it's been too long so I don't have that info handy.
That should get you started. Really, an entire chapter can be written on how a database can optimize even such a simple query.
I bet your problem is the huge number of rows that are being inserted into that temp table. Is there any way you can add a WHERE clause before you SELECT every row in the database?
Another thing to do is add some indexes like this:
attributes.{attributeId, attributeName, attributeValue}
expressions.{attributeId, expressionID}
This is hacky! But useful if it's a last resort.
What this does is create a query plan that can be "entirely answered" by indexes. Usually, an index actually causes a double-I/O in your above query: one to hit the index (i.e. probe into the table), another to fetch the actual row referred to by the index (to pull attributeName, etc).
This is especially helpful if "attributes" or "expresssions" is a wide table. That is, a table that's expensive to fetch the rows from.
Finally, the best way to speed your query is to add a WHERE clause!
If I'm understanding your schema correctly, you're stating that your tables kinda look like this:
Expressions: PK - ExpressionID, AttributeID
Attributes: PK - AttributeID
Assuming that each PK is a clustered index, that still means that an Index Scan is required on the Expressions table. You might want to consider creating an Index on the Expressions table such as: AttributeID, ExpressionID. This would help to stop the Index Scanning that currently occurs.
Tips,
If you want to speed up your query using join:
For "inner join/join",
Don't use where condition instead use it in "ON" condition.
Eg:
select id,name from table1 a
join table2 b on a.name=b.name
where id='123'
Try,
select id,name from table1 a
join table2 b on a.name=b.name and a.id='123'
For "Left/Right Join",
Don't use in "ON" condition, Because if you use left/right join it will get all rows for any one table.So, No use of using it in "On". So, Try to use "Where" condition

Resources