I have a table with column name IDENTIFIER and the table (TAB1) has an index for this column. whenever i try to query a single data using a simple where clause with single value, explain plan shows that it is utilizing an existing index on that particular column.
But whenever i have a list of values in another table, say a temporary table ( TEMP_IDENTIFIER ) with list of all identifiers that i want to query and when i frame a query on the same table with an IN clause , i could see that explain plan is not considering the index, instead it performs an full table scan on the table
Ideally i would want the second query to utilize the existing index as well
Please find the both the queries and explain plan as follows
Query 1
explain plan for
select * from schemaowner.TAB1
where IDENTIFIER = 'A';
Explain Plan
Plan hash value: 4172144893
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 51 | 12750 | 11 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TAB1 | 51 | 12750 | 11 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | COL_INDEX | 51 | | 4 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("IDENTIFIER"='A')
Query 2
explain plan for
select * from schemaowner.TAB1
where IDENTIFIER in (select IDENTIFIER from SCHEMAOWNER.temp_IDENTIFIER);
Explain Plan :
Plan hash value: 935676029
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3135K| 822M| | 74751 (1)| 00:14:58 |
|* 1 | HASH JOIN RIGHT SEMI| | 3135K| 822M| 2216K| 74751 (1)| 00:14:58 |
| 2 | TABLE ACCESS FULL | TEMP_IDENTIFIER | 61115 | 1492K| | 85 (2)| 00:00:02 |
| 3 | TABLE ACCESS FULL | TAB1 | 3745K| 893M| | 28028 (2)| 00:05:37 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("IDENTIFIER"="IDENTIFIER")
Note
-----
- dynamic sampling used for this statement (level=2)
Thats the beauty of the optimizer. It's figured out (or costed) that a SEMI join is the most efficient method :)
I have tableA, which is (list) partitioned almost evenly by 5 values. tableA contains 100million rows and has a local (partitioned) index on customFunc(x). Following query does RANGE SCAN using mentioned index and takes about 5-10s to execute and returns 5million.
select count(*) from tableA where customFunc(x)='abc';
Unfortunately, when I try to execute the same query on a specific partition it does full table scan and takes forever..
select count(*) from tableA where customFunc(x)='abc' and partitioning_key='DT';
I completely don't understand why it works that way.. Shouldn't it take an advantage of partition pruning in the 2nd case?
EDIT: Adding a hint /*+ index(tableA mentionedIndex) */ solves the problem, but I still don't understand why it is not used by default
EDIT: XPLAN 1
Plan hash value: xxx
---------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 17 | 29335 (1)| 00:00:02 | | |
| 1 | SORT AGGREGATE | | 1 | 17 | | | | |
| 2 | PARTITION LIST ALL| | 5227K| 84M| 29335 (1)| 00:00:02 | 1 | 5 |
|* 3 | INDEX RANGE SCAN | CUSTOM_FUNC_INDEX | 5227K| 84M| 29335 (1)| 00:00:02 | 1 | 5 |
---------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access(customFunc(x)='abc')
XPLAN 2 (with partition key)
Plan hash value: yyy
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 30 | 679K (2)| 00:00:27 | | |
| 1 | SORT AGGREGATE | | 1 | 30 | | | | |
| 2 | PARTITION LIST SINGLE| | 4014K| 114M| 679K (2)| 00:00:27 | KEY | KEY |
|* 3 | TABLE ACCESS FULL | tableA | 4014K| 114M| 679K (2)| 00:00:27 | 1 | 1 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter(customFunc(x)='abc')
Shouldn't it take an advantage of partition pruning in the 2nd case?
The second query does apply partition pruning: that's what this step means: PARTITION LIST SINGLE . The catch is that partition pruning means reading the whole partition: in the second plan the step TABLE ACCESS FULL means read all the rows in the partition, don't use an index. Consequently, the second query is evaluating customFunc(x)='abc' for every row in the partition.
What is the point of creating a local index with partitioning key?
The difference is that a local index prefixed with the partitioning key will always use partition pruning, whereas when a local index doesn't have the partitioning key the optimiser can choose whether to apply partition pruning. But if you want to run queries that don't use the partition key then clearly you need the non-prefixed version.
Now you're right to be puzzled. Given the partition key as a predicate the optimizer ought to have executed an INDEX RANGE SCAN against the indicated partition. To figure out why it doesn't will require more effort on your part. It may be that your statistics are stale or you need to gather histograms. Maybe the fact that it's a function-based index confuses the optimizer. If you have the access, or a co-operative DBA, you can use the 10053 event to look under the hood. Find out more.
I asked this question (SQL Query to select one set when there are duplicates) last year and got the solution to count the SLAs. Basically, count the number of minimum SLA for each application. However, I have a follow-up question. I want a query that will return the rows of the minimum SLA and earliest date for each REF_ID (or APP_ID)
ID | REF_ID | APP_ID | FIRST_DATE | SECOND_DTE | SLA |
1 | 11 | 101 | 2016/10/01 | 2016/10/02 | 1 |
2 | 12 | 102 | 2016/10/01 | 2016/10/04 | 2 |
3 | 12 | 102 | 2016/10/01 | 2016/10/05 | 2 |
So the query should return the first and second row.
I would very much appreciate if someone could provide a solution.
I have updated the query based on User726720 answer. This does not return entire rows but sufficient data.
SELECT REF_ID, MIN(SECOND_DTE), MIN(SLA) FROM TABLE WHERE FIRST_DTE > '2016-10-01' AND FIRST_DTE < '2016-11-01' GROUP BY REF_ID
This should do the job:
Select * from table
WHERE SLA = ( SELECT MIN(SLA) FROM table)
and SECOND_DTE = ( SELECT MIN(SECOND_DTE ) FROM table)
Let's suppose I have the following table with a clustered index on a column (say, a)
CREATE TABLE Tmp
(
a int,
constraint pk_a primary key clustered (a)
)
Then, let's assume that I have two sets of a very large number of rows to insert to the table.
1st set) values are sequentially increasing (i.e., {0,1,2,3,4,5,6,7,8,9,..., 999999997, 999999998, 99999999})
2nd set) values are sequentially decreasing (i.e., {99999999,999999998,999999997, ..., 3,2,1,0}
do you think there would be performance difference between inserting values in the first set and the second set? If so, why?
thanks
SQL Server will generally try and sort large inserts into clustered index order prior to insert anyway.
If the source for the insert is a table variable however then it will not take account of the cardinality unless the statement is recompiled after the table variable is populated. Without this it will assume the insert will only be one row.
The below script demonstrates three possible scenarios.
The insert source is already exactly in correct order.
The insert source is exactly in reversed order.
The insert source is exactly in reversed order but OPTION (RECOMPILE) is used so SQL Server compiles a plan suited for inserting 1,000,000 rows.
Execution Plans
The third one has a sort operator to get the inserted values into clustered index order first.
/*Create three separate identical tables*/
CREATE TABLE Tmp1(a int primary key clustered (a))
CREATE TABLE Tmp2(a int primary key clustered (a))
CREATE TABLE Tmp3(a int primary key clustered (a))
DBCC FREEPROCCACHE;
GO
DECLARE #Source TABLE (N INT PRIMARY KEY (N ASC))
INSERT INTO #Source
SELECT TOP (1000000) ROW_NUMBER() OVER (ORDER BY (SELECT 0))
FROM sys.all_columns c1, sys.all_columns c2, sys.all_columns c3
SET STATISTICS TIME ON;
PRINT 'Tmp1'
INSERT INTO Tmp1
SELECT TOP (1000000) N
FROM #Source
ORDER BY N
PRINT 'Tmp2'
INSERT INTO Tmp2
SELECT TOP (1000000) 1000000 - N
FROM #Source
ORDER BY N
PRINT 'Tmp3'
INSERT INTO Tmp3
SELECT 1000000 - N
FROM #Source
ORDER BY N
OPTION (RECOMPILE)
SET STATISTICS TIME OFF;
Verify Results and clean up
SELECT object_name(object_id) AS name,
page_count,
avg_fragmentation_in_percent,
fragment_count,
avg_fragment_size_in_pages
FROM
sys.dm_db_index_physical_stats(db_id(), object_id('Tmp1'), 1, NULL, 'DETAILED')
WHERE index_level = 0
UNION ALL
SELECT object_name(object_id) AS name,
page_count,
avg_fragmentation_in_percent,
fragment_count,
avg_fragment_size_in_pages
FROM
sys.dm_db_index_physical_stats(db_id(), object_id('Tmp2'), 1, NULL, 'DETAILED')
WHERE index_level = 0
UNION ALL
SELECT object_name(object_id) AS name,
page_count,
avg_fragmentation_in_percent,
fragment_count,
avg_fragment_size_in_pages
FROM
sys.dm_db_index_physical_stats(db_id(), object_id('Tmp3'), 1, NULL, 'DETAILED')
WHERE index_level = 0
DROP TABLE Tmp1, Tmp2, Tmp3
STATISTICS TIME ON results
+------+----------+--------------+
| | CPU Time | Elapsed Time |
+------+----------+--------------+
| Tmp1 | 6718 ms | 6775 ms |
| Tmp2 | 7469 ms | 7240 ms |
| Tmp3 | 7813 ms | 9318 ms |
+------+----------+--------------+
Fragmentation Results
+------+------------+------------------------------+----------------+----------------------------+
| name | page_count | avg_fragmentation_in_percent | fragment_count | avg_fragment_size_in_pages |
+------+------------+------------------------------+----------------+----------------------------+
| Tmp1 | 3345 | 0.448430493 | 17 | 196.7647059 |
| Tmp2 | 3345 | 99.97010463 | 3345 | 1 |
| Tmp3 | 3345 | 0.418535127 | 16 | 209.0625 |
+------+------------+------------------------------+----------------+----------------------------+
Conclusion
In this case all three of them ended up using exactly the same number of pages. However Tmp2 is 99.97% fragmented compared with only 0.4% for the other two. The insert to Tmp3 took the longest as this required an additional sort step first but this one time cost needs to be set against the benefit to future scans against the table of minimal fragmentation.
The reason why Tmp2 is so heavily fragmented can be seen from the below query
WITH T AS
(
SELECT TOP 3000 file_id, page_id, a
FROM Tmp2
CROSS APPLY sys.fn_PhysLocCracker(%%physloc%%)
ORDER BY a
)
SELECT file_id, page_id, MIN(a), MAX(a)
FROM T
group by file_id, page_id
ORDER BY MIN(a)
With zero logical fragmentation the page with the next highest key value would be the next highest page in the file but the pages are exactly in the opposite order of what they are supposed to be.
+---------+---------+--------+--------+
| file_id | page_id | Min(a) | Max(a) |
+---------+---------+--------+--------+
| 1 | 26827 | 0 | 143 |
| 1 | 26826 | 144 | 442 |
| 1 | 26825 | 443 | 741 |
| 1 | 26824 | 742 | 1040 |
| 1 | 26823 | 1041 | 1339 |
| 1 | 26822 | 1340 | 1638 |
| 1 | 26821 | 1639 | 1937 |
| 1 | 26820 | 1938 | 2236 |
| 1 | 26819 | 2237 | 2535 |
| 1 | 26818 | 2536 | 2834 |
| 1 | 26817 | 2835 | 2999 |
+---------+---------+--------+--------+
The rows arrived in descending order so for example values 2834 to 2536 were put into page 26818 then a new page was allocated for 2535 but this was page 26819 rather than page 26817.
One possible reason why the insert to Tmp2 took longer than Tmp1 is because as the rows are being inserted in exactly reverse order on the page every insert to Tmp2 means the slot array on the page needs to be rewritten with all previous entries moved up to make room for the new arrival.
To answer this question, you only need to look up what effect clustering has on data and the manner in which it is logically ordered. By clustering ascending, higher numbers get added on to the end of the table; inserts will be very fast. When inserting in reverse, it will be inserted in between two other records (read up on page splitting); this will result in slower inserts. This actually has other negative effects as well (read up on fill factor).
It has to do with allocating pages sequentially as is done for a clustered index. With the first they would naturally cluster together. But in the second, I think you would have to keep moving the page locations to have them sequentially ascending. However, I really only understand SQL server at a conceptual level, so you'd have to test.
I have a query:
select min(timestamp) from table
This table has 60+million rows, and daily I delete a few off the end. To determine whether or not there is any data old enough do delete I run the query above. There is an index on timestamp ascending, containing only one column, and the query plan in oracle causes this to be a full index scan. Should this not be the definition of a seek?
edit including plan:
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
| 2 | INDEX FULL SCAN (MIN/MAX)| NEVENTS_I2 | 1 | 8 | 4 (100)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 8 | | |
| 0 | SELECT STATEMENT | | 1 | 8 | 4 (0)| 00:00:01 |
Can you post the actual query plan? Are you sure that it is not doing a min/max index full scan? As you can see in this example, we're getting the MIN value from a 100,000 row table using a min/max index full scan with only a handful of consistent gets.
SQL> create table foo (
2 col1 date not null
3 );
Table created.
SQL> insert into foo
2 select sysdate + level
3 from dual
4 connect by level <= 100000;
100000 rows created.
SQL> create index idx_foo_col1
2 on foo( col1 );
Index created.
SQL> analyze table foo compute statistics for all indexed columns;
Table analyzed.
SQL> set autotrace on;
<<Note that I ran this statement once just to get the delayed block cleanout to
happen so that the consistent gets number wouldn't be skewed. You could run a
different query as well>>
1* select min(col1) from foo
SQL> /
MIN(COL1)
---------
02-FEB-11
Execution Plan
----------------------------------------------------------
Plan hash value: 817909383
--------------------------------------------------------------------------------
-----------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
Time |
--------------------------------------------------------------------------------
-----------
| 0 | SELECT STATEMENT | | 1 | 7 | 2 (0)|
00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 7 | |
|
| 2 | INDEX FULL SCAN (MIN/MAX)| IDX_FOO_COL1 | 1 | 7 | 2 (0)|
00:00:01 |
--------------------------------------------------------------------------------
-----------
Note
-----
- dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
2 consistent gets
0 physical reads
0 redo size
532 bytes sent via SQL*Net to client
524 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
At first I thought that the index would only be used if the column is declared NOT NULL. I tested with the following setup:
SQL> CREATE TABLE my_table (ts TIMESTAMP);
Table created
SQL> INSERT INTO my_table
2 SELECT systimestamp + ROWNUM * INTERVAL '1' SECOND
3 FROM dual CONNECT BY LEVEL <= 100000;
100000 rows inserted
SQL> CREATE INDEX ix ON my_table(ts);
Index created
SQL> EXPLAIN PLAN FOR SELECT MIN(ts) FROM my_table;
Explained
SQL> SELECT * FROM TABLE(dbms_xplan.display);
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 69 (2)| 00:00:0
| 1 | SORT AGGREGATE | | 1 | 13 | |
| 2 | INDEX FULL SCAN (MIN/MAX)| IX | 90958 | 1154K| |
--------------------------------------------------------------------------------
Here we notice that the index is used, but all rows from the index are read. If we specify that the column is not null we get a much better plan:
SQL> ALTER TABLE my_table MODIFY ts NOT NULL;
Table altered
SQL> EXPLAIN PLAN FOR SELECT MIN(ts) FROM my_table;
Explained
SQL> SELECT * FROM TABLE(dbms_xplan.display);
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 2 (0)| 00:00:0
| 1 | SORT AGGREGATE | | 1 | 13 | |
| 2 | INDEX FULL SCAN (MIN/MAX)| IX | 90958 | 1154K| 2 (0)| 00:00:0
--------------------------------------------------------------------------------
In fact this is the same plan that is also used if we add a WHERE clause (Oracle will read a single row from the index):
SQL> EXPLAIN PLAN FOR SELECT MIN(ts) FROM my_table WHERE ts IS NOT NULL;
Explained
SQL> SELECT * FROM TABLE(dbms_xplan.display);
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 2 (0)| 00:00:
| 1 | SORT AGGREGATE | | 1 | 13 | |
| 2 | FIRST ROW | | 90958 | 1154K| 2 (0)| 00:00:
| 3 | INDEX FULL SCAN (MIN/MAX)| IX | 90958 | 1154K| 2 (0)| 00:00:
--------------------------------------------------------------------------------
This last plan shows (line 2) that Oracle is indeed performing a "seek".
Just wanted to hone in on the fact that an "INDEX FULL SCAN (MIN/MAX)" is simply not the same as an "INDEX FULL SCAN". An INDEX FULL SCAN really does scan the entire index (possibly with filtering). However an INDEX FULL SCAN (MIN/MAX) or INDEX RANGE SCAN (MIN/MAX) only gets the smallest or largest leaf block (from the range), but can only be employed as long as the column is NOT NULL (which is a bit silly, and really a bug, since a NULL value is by definition neither the smallest nor largest value). The (MIN/MAX) optimization is an implicit FIRST_ROWS action, and doesn't need the "WHERE ... IS NOT NULL" query condition to perform the optimization. Interestingly the MIN/MAX optimization is normally not considered by the CBO for function-based indexes, that's another little bug.