Index is not being used on a partitioned table - database

I have tableA, which is (list) partitioned almost evenly by 5 values. tableA contains 100million rows and has a local (partitioned) index on customFunc(x). Following query does RANGE SCAN using mentioned index and takes about 5-10s to execute and returns 5million.
select count(*) from tableA where customFunc(x)='abc';
Unfortunately, when I try to execute the same query on a specific partition it does full table scan and takes forever..
select count(*) from tableA where customFunc(x)='abc' and partitioning_key='DT';
I completely don't understand why it works that way.. Shouldn't it take an advantage of partition pruning in the 2nd case?
EDIT: Adding a hint /*+ index(tableA mentionedIndex) */ solves the problem, but I still don't understand why it is not used by default
EDIT: XPLAN 1
Plan hash value: xxx
---------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 17 | 29335 (1)| 00:00:02 | | |
| 1 | SORT AGGREGATE | | 1 | 17 | | | | |
| 2 | PARTITION LIST ALL| | 5227K| 84M| 29335 (1)| 00:00:02 | 1 | 5 |
|* 3 | INDEX RANGE SCAN | CUSTOM_FUNC_INDEX | 5227K| 84M| 29335 (1)| 00:00:02 | 1 | 5 |
---------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access(customFunc(x)='abc')
XPLAN 2 (with partition key)
Plan hash value: yyy
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 30 | 679K (2)| 00:00:27 | | |
| 1 | SORT AGGREGATE | | 1 | 30 | | | | |
| 2 | PARTITION LIST SINGLE| | 4014K| 114M| 679K (2)| 00:00:27 | KEY | KEY |
|* 3 | TABLE ACCESS FULL | tableA | 4014K| 114M| 679K (2)| 00:00:27 | 1 | 1 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter(customFunc(x)='abc')

Shouldn't it take an advantage of partition pruning in the 2nd case?
The second query does apply partition pruning: that's what this step means: PARTITION LIST SINGLE . The catch is that partition pruning means reading the whole partition: in the second plan the step TABLE ACCESS FULL means read all the rows in the partition, don't use an index. Consequently, the second query is evaluating customFunc(x)='abc' for every row in the partition.
What is the point of creating a local index with partitioning key?
The difference is that a local index prefixed with the partitioning key will always use partition pruning, whereas when a local index doesn't have the partitioning key the optimiser can choose whether to apply partition pruning. But if you want to run queries that don't use the partition key then clearly you need the non-prefixed version.
Now you're right to be puzzled. Given the partition key as a predicate the optimizer ought to have executed an INDEX RANGE SCAN against the indicated partition. To figure out why it doesn't will require more effort on your part. It may be that your statistics are stale or you need to gather histograms. Maybe the fact that it's a function-based index confuses the optimizer. If you have the access, or a co-operative DBA, you can use the 10053 event to look under the hood. Find out more.

Related

Column Index not reflecting in Explain Plan for predicates with "IN" Statement

I have a table with column name IDENTIFIER and the table (TAB1) has an index for this column. whenever i try to query a single data using a simple where clause with single value, explain plan shows that it is utilizing an existing index on that particular column.
But whenever i have a list of values in another table, say a temporary table ( TEMP_IDENTIFIER ) with list of all identifiers that i want to query and when i frame a query on the same table with an IN clause , i could see that explain plan is not considering the index, instead it performs an full table scan on the table
Ideally i would want the second query to utilize the existing index as well
Please find the both the queries and explain plan as follows
Query 1
explain plan for
select * from schemaowner.TAB1
where IDENTIFIER = 'A';
Explain Plan
Plan hash value: 4172144893
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 51 | 12750 | 11 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TAB1 | 51 | 12750 | 11 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | COL_INDEX | 51 | | 4 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("IDENTIFIER"='A')
Query 2
explain plan for
select * from schemaowner.TAB1
where IDENTIFIER in (select IDENTIFIER from SCHEMAOWNER.temp_IDENTIFIER);
Explain Plan :
Plan hash value: 935676029
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3135K| 822M| | 74751 (1)| 00:14:58 |
|* 1 | HASH JOIN RIGHT SEMI| | 3135K| 822M| 2216K| 74751 (1)| 00:14:58 |
| 2 | TABLE ACCESS FULL | TEMP_IDENTIFIER | 61115 | 1492K| | 85 (2)| 00:00:02 |
| 3 | TABLE ACCESS FULL | TAB1 | 3745K| 893M| | 28028 (2)| 00:05:37 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("IDENTIFIER"="IDENTIFIER")
Note
-----
- dynamic sampling used for this statement (level=2)
Thats the beauty of the optimizer. It's figured out (or costed) that a SEMI join is the most efficient method :)

How do I set 2 columns so each entry is unique against both columns?

I have a record that holds 2 license "keys" (actually GUIDs). When a request comes to our service it includes a key (GUID) in the request. I then do a query looking for a record that has this value in either the column Key1 or Key2.
The purpose of this is users will use Key1 for everything. Then they discover that Key1 has become public. So they switch to Key2 and then after 15 minutes, change the value of Key1. Now the old Key1 value is of no use.
By having the 2 keys, it allows the switch over with no downtime.
I need any key value to be unique. Not that any pair of values is unique. Not that a value in Key1 is unique in all rows for Key 1. But that a new value is unique in all rows.Key1 and rows.Key2.
Is there a way to force this in Sql Server. Or do I need to do this myself with a select before doing an insert or update?
-------------------------------------------------------------------------------------------
| LicenseId | ApiKey1 | APiKey2 |
| 1 | af53d192-7fa3-4be0-b3d4-7efe17a397b5 | 1a87cc4a-1941-4af7-aeaa-bf9690f47eef |
| 2 | 5bbc2d06-ed6f-4444-aa22-73820dd6f3f6 | c2bdd9d9-fd47-4727-83f8-02ed0e7537e1 |
| 3 | 8acfa8b4-aa4b-41a7-9d3d-b6ba1eac838e | 30c18f2d-5d89-4e5d-8e8e-2d2b647d6ab6 |
-------------------------------------------------------------------------------------------
I need to insure if I am going to create record LicenseId = 4, that if it has ApiKey2 = 'af53d192-7fa3-4be0-b3d4-7efe17a397b5', that the insert will fail because that guid is ApiKey1 for LicenseId = 1.
The most natural way to enforce this in the database is to put all keys in a single column. Eg
create table ApiKeys
(
LicenceId int,
KeyId int check (KeyId in (0,1)),
constraint pk_ApiKeys primary key (LicenceId,KeyId),
KeyGuid uniqueidentifier unique
)
Arguably having both the keys on the same row violates 1NF, and certainly your desire for uniqueness across the two column strongly suggests that they belong to a single domain.
So instead of storing ApiKey1 and ApiKey2 on the same row, you store them on two separate rows.
So instead of
---------------
| LicenseId | ApiKey1 | APiKey2 |
| 1 | af53d192-7fa3-4be0-b3d4-7efe17a397b5 | 1a87cc4a-1941-4af7-aeaa-bf9690f47eef |
| 2 | 5bbc2d06-ed6f-4444-aa22-73820dd6f3f6 | c2bdd9d9-fd47-4727-83f8-02ed0e7537e1 |
| 3 | 8acfa8b4-aa4b-41a7-9d3d-b6ba1eac838e | 30c18f2d-5d89-4e5d-8e8e-2d2b647d6ab6 |
-------------------------------------------------------------------------------------------
You would have:
----------------------------------------------------------
| LicenseId | KeyId | ApiKey |
| 1 | 0 | af53d192-7fa3-4be0-b3d4-7efe17a397b5|
| 1 | 1 | 1a87cc4a-1941-4af7-aeaa-bf9690f47ee4|
| 2 | 0 | 5bbc2d06-ed6f-4444-aa22-73820dd6f3f6|
| 2 | 1 | c2bdd9d9-fd47-4727-83f8-02ed0e7537e1|
| 3 | 0 | 8acfa8b4-aa4b-41a7-9d3d-b6ba1eac838e|
| 3 | 1 | 30c18f2d-5d89-4e5d-8e8e-2d2b647d6ab6|
----------------------------------------------------------

SQL Server TPH (Table Per Hierarchy) auto increment multiple columns base on type

We currently use TPT (Table Per Type) in Entity Framework, this is very slow as we have about 20 tables, when they are queried, Entity Framework creates some massive disguising SQL which is very slow.
Each table has an auto increment integer column, this allows each type to have a number that is incremented per type. This is what the clients wanted. Now that we are wanting to move to the more performant TPH, we need all these table columns moved to the one table.
How can we have the auto increment columns based on the type as in the results below?
e.g.
Current Job Task
| TaskId | TaskNumber |
-----------------------------
| 1234 | 1 |
| 2345 | 2 |
Current Work Task
| TaskId | TaskNumber |
-----------------------------
| 3244 | 1 |
| 3245 | 2 |
This is the TPH table structure we want, as you can see, we want the task number to increment based on the Type of task.
| TaskId | Type | JobTaskNumber | WorkTaskNumber |
---------------------------------------------------------------
| 1234 | Job | 1 | null |
| 2345 | Job | 2 | null |
| 3244 | Work | null | 1 |
| 3245 | Work | null | 2 |
I am wondering if we use a seeding table, but any solutions greatly appreciated
Many thanks
Andrew
OK so did what I thought would work.
Not a hugely nice approach as we need about 20 seed tables.Each table has just an identity id defined as a BIGINT in sql server
When we want to add and get a new incremented id we just call this using dapper to get the result.
INSERT INTO SeedMyTable DEFAULT VALUES; SELECT CAST(SCOPE_IDENTITY() AS BIGINT)

Join two Select Statements into a single row when one select has n amount of entries?

Is it possible in SQL Server to take two select statements and combine them into a single row without knowing how many entries one of the select statements got?
I've been looking around at various Join solutions but they all seem to work on the basis that the amount of columns is predetermined. I have a case here where one table has a determined amount of columns (t1) and the other table have an undetermined amount of entries (t2) which all use a key that matches one entry in t1.
+----+------+-----+
| id | name | ... |
+----+------+-----+
| 1 | John | ... |
+----+------+-----+
And
+-------------+----------------+
| activity_id | account_number |
+-------------+----------------+
| 1 | 12345467879 |
| 1 | 98765432515 |
| ... | ... |
| ... | ... |
+-------------+----------------+
The number of account numbers belonging to the first query is unknown.
After the query it would become:
+----+------+-----+----------------+------------------+-----+------------------+
| id | name | ... | account_number | account_number_2 | ... | account_number_n |
+----+------+-----+----------------+------------------+-----+------------------+
| 1 | John | ... | 12345467879 | 98765432515 | ... | ... |
+----+------+-----+----------------+------------------+-----+------------------+
So I don't know how many account numbers could be associated with the id beforehand.

Why am I getting an index scan for a covered query using aggregate function?

I have a query:
select min(timestamp) from table
This table has 60+million rows, and daily I delete a few off the end. To determine whether or not there is any data old enough do delete I run the query above. There is an index on timestamp ascending, containing only one column, and the query plan in oracle causes this to be a full index scan. Should this not be the definition of a seek?
edit including plan:
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
| 2 | INDEX FULL SCAN (MIN/MAX)| NEVENTS_I2 | 1 | 8 | 4 (100)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 8 | | |
| 0 | SELECT STATEMENT | | 1 | 8 | 4 (0)| 00:00:01 |
Can you post the actual query plan? Are you sure that it is not doing a min/max index full scan? As you can see in this example, we're getting the MIN value from a 100,000 row table using a min/max index full scan with only a handful of consistent gets.
SQL> create table foo (
2 col1 date not null
3 );
Table created.
SQL> insert into foo
2 select sysdate + level
3 from dual
4 connect by level <= 100000;
100000 rows created.
SQL> create index idx_foo_col1
2 on foo( col1 );
Index created.
SQL> analyze table foo compute statistics for all indexed columns;
Table analyzed.
SQL> set autotrace on;
<<Note that I ran this statement once just to get the delayed block cleanout to
happen so that the consistent gets number wouldn't be skewed. You could run a
different query as well>>
1* select min(col1) from foo
SQL> /
MIN(COL1)
---------
02-FEB-11
Execution Plan
----------------------------------------------------------
Plan hash value: 817909383
--------------------------------------------------------------------------------
-----------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
Time |
--------------------------------------------------------------------------------
-----------
| 0 | SELECT STATEMENT | | 1 | 7 | 2 (0)|
00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 7 | |
|
| 2 | INDEX FULL SCAN (MIN/MAX)| IDX_FOO_COL1 | 1 | 7 | 2 (0)|
00:00:01 |
--------------------------------------------------------------------------------
-----------
Note
-----
- dynamic sampling used for this statement (level=2)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
2 consistent gets
0 physical reads
0 redo size
532 bytes sent via SQL*Net to client
524 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
At first I thought that the index would only be used if the column is declared NOT NULL. I tested with the following setup:
SQL> CREATE TABLE my_table (ts TIMESTAMP);
Table created
SQL> INSERT INTO my_table
2 SELECT systimestamp + ROWNUM * INTERVAL '1' SECOND
3 FROM dual CONNECT BY LEVEL <= 100000;
100000 rows inserted
SQL> CREATE INDEX ix ON my_table(ts);
Index created
SQL> EXPLAIN PLAN FOR SELECT MIN(ts) FROM my_table;
Explained
SQL> SELECT * FROM TABLE(dbms_xplan.display);
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 69 (2)| 00:00:0
| 1 | SORT AGGREGATE | | 1 | 13 | |
| 2 | INDEX FULL SCAN (MIN/MAX)| IX | 90958 | 1154K| |
--------------------------------------------------------------------------------
Here we notice that the index is used, but all rows from the index are read. If we specify that the column is not null we get a much better plan:
SQL> ALTER TABLE my_table MODIFY ts NOT NULL;
Table altered
SQL> EXPLAIN PLAN FOR SELECT MIN(ts) FROM my_table;
Explained
SQL> SELECT * FROM TABLE(dbms_xplan.display);
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 2 (0)| 00:00:0
| 1 | SORT AGGREGATE | | 1 | 13 | |
| 2 | INDEX FULL SCAN (MIN/MAX)| IX | 90958 | 1154K| 2 (0)| 00:00:0
--------------------------------------------------------------------------------
In fact this is the same plan that is also used if we add a WHERE clause (Oracle will read a single row from the index):
SQL> EXPLAIN PLAN FOR SELECT MIN(ts) FROM my_table WHERE ts IS NOT NULL;
Explained
SQL> SELECT * FROM TABLE(dbms_xplan.display);
--------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 2 (0)| 00:00:
| 1 | SORT AGGREGATE | | 1 | 13 | |
| 2 | FIRST ROW | | 90958 | 1154K| 2 (0)| 00:00:
| 3 | INDEX FULL SCAN (MIN/MAX)| IX | 90958 | 1154K| 2 (0)| 00:00:
--------------------------------------------------------------------------------
This last plan shows (line 2) that Oracle is indeed performing a "seek".
Just wanted to hone in on the fact that an "INDEX FULL SCAN (MIN/MAX)" is simply not the same as an "INDEX FULL SCAN". An INDEX FULL SCAN really does scan the entire index (possibly with filtering). However an INDEX FULL SCAN (MIN/MAX) or INDEX RANGE SCAN (MIN/MAX) only gets the smallest or largest leaf block (from the range), but can only be employed as long as the column is NOT NULL (which is a bit silly, and really a bug, since a NULL value is by definition neither the smallest nor largest value). The (MIN/MAX) optimization is an implicit FIRST_ROWS action, and doesn't need the "WHERE ... IS NOT NULL" query condition to perform the optimization. Interestingly the MIN/MAX optimization is normally not considered by the CBO for function-based indexes, that's another little bug.

Resources