Does Oracle's Optimizer Take into Consideration a Subquery's Underlying Columns? - database

Say you have a query like so:
with subselect as (
select foo_id
from foo
)
select bar_id
from bar
join subselect on foo_id = bar_id
where foo_id = 1000
Imagine you have an index on foo_id. Is Oracle's database smart enough to use the index in the query for the line "where foo_id = 1000"? OR since foo_id is wrapped in a subquery, does Oracle lose the index information related to this column?

Perform a simple test:
create table foo as
select t.object_id as foo_id, t.* from all_objects t;
create table bar as
select t.object_id as bar_id, t.* from all_objects t;
create index foo_id_ix on foo(foo_id);
exec dbms_stats.GATHER_TABLE_STATS(ownname=>user, tabname=>'FOO', method_opt=>'FOR ALL INDEXED COLUMNS' );
explain plan for
with subselect as (
select foo_id
from foo
)
select bar_id
from bar
join subselect on foo_id = bar_id
where foo_id = 1000;
select * from table( DBMS_XPLAN.DISPLAY );
and a result of last query is:
Plan hash value: 445248211
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 10 | 366 (1)| 00:00:01 |
| 1 | MERGE JOIN CARTESIAN| | 1 | 10 | 366 (1)| 00:00:01 |
|* 2 | TABLE ACCESS FULL | BAR | 1 | 5 | 365 (1)| 00:00:01 |
| 3 | BUFFER SORT | | 1 | 5 | 1 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | FOO_ID_IX | 1 | 5 | 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("BAR_ID"=1000)
4 - access("FOO_ID"=1000)
In the above example Oracle uses |* 4 | INDEX RANGE SCAN using index: FOO_ID_IX for filter 4 - access("FOO_ID"=1000)
So the answer is:
yes, the Oracle's database is smart enough to use the index in the query for the line "where foo_id = 1000"

Related

DELETE TOP variable records with variable from grouping of another table

Say I have two tables: A and B
Table A
+----+-------+
| id | value |
+----+-------+
| 1 | 20 |
| 2 | 20 |
| 3 | 10 |
| 4 | 0 |
+----+-------+
Table B
+----+-------+
| id | value |
+----+-------+
| 1 | 20 |
| 2 | 10 |
| 3 | 30 |
| 4 | 20 |
| 5 | 20 |
| 6 | 10 |
+----+-------+
If I do SELECT value, COUNT(*) AS occurrence FROM A GROUP BY value, I'll get:
+-------+------------+
| value | occurrence |
+-------+------------+
| 20 | 2 |
| 10 | 1 |
| 0 | 1 |
+-------+------------+
Based on this grouping of table A, I want to delete occurrence records from table B with the same values. In other words, I want to delete from B 2 records with value 20, 1 record with value 10, and 1 record with value 0. (Other conditions include 'do nothing if no record exists' and 'smallest id first', but I think these conditions are pretty trivial compared to the bulk of this question.)
Table B after deleting should be:
+----+-------+
| id | value |
+----+-------+
| 3 | 30 |
| 5 | 20 |
| 6 | 10 |
+----+-------+
From the official TOP documentation, doesn't seems like I can perform some JOIN to use as the TOP expression.
We could use ROW_NUMBER with CTEs here:
WITH cteA AS (
SELECT value, COUNT(*) cnt
FROM A
GROUP BY value
),
cteB AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY value ORDER BY id) rn
FROM B
)
DELETE
FROM cteB b
INNER JOIN cteA a
ON b.value = a.value
WHERE
b.rn <= a.cnt;
The logic here is that we use ROW_NUMBER to keep track of the order of each value in the B table. Then, we join to bring in the counts of each value in the A table, and we only delete B records for which the row number is strictly less than or equal to the A count.
See the demo link below to verify that the logic be correct. Note that I use a select there, not a delete, but the correct rows are being targeted for deletion.
Demo

Column Index not reflecting in Explain Plan for predicates with "IN" Statement

I have a table with column name IDENTIFIER and the table (TAB1) has an index for this column. whenever i try to query a single data using a simple where clause with single value, explain plan shows that it is utilizing an existing index on that particular column.
But whenever i have a list of values in another table, say a temporary table ( TEMP_IDENTIFIER ) with list of all identifiers that i want to query and when i frame a query on the same table with an IN clause , i could see that explain plan is not considering the index, instead it performs an full table scan on the table
Ideally i would want the second query to utilize the existing index as well
Please find the both the queries and explain plan as follows
Query 1
explain plan for
select * from schemaowner.TAB1
where IDENTIFIER = 'A';
Explain Plan
Plan hash value: 4172144893
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 51 | 12750 | 11 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TAB1 | 51 | 12750 | 11 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | COL_INDEX | 51 | | 4 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("IDENTIFIER"='A')
Query 2
explain plan for
select * from schemaowner.TAB1
where IDENTIFIER in (select IDENTIFIER from SCHEMAOWNER.temp_IDENTIFIER);
Explain Plan :
Plan hash value: 935676029
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3135K| 822M| | 74751 (1)| 00:14:58 |
|* 1 | HASH JOIN RIGHT SEMI| | 3135K| 822M| 2216K| 74751 (1)| 00:14:58 |
| 2 | TABLE ACCESS FULL | TEMP_IDENTIFIER | 61115 | 1492K| | 85 (2)| 00:00:02 |
| 3 | TABLE ACCESS FULL | TAB1 | 3745K| 893M| | 28028 (2)| 00:05:37 |
-------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("IDENTIFIER"="IDENTIFIER")
Note
-----
- dynamic sampling used for this statement (level=2)
Thats the beauty of the optimizer. It's figured out (or costed) that a SEMI join is the most efficient method :)

How does database (oracle) joins handle filter condition

When we create a join between 2 tables in Oracle, with some additional filter condition on one or both tables, will oracle join the tables first and then filter or will it filter the conditions first and then join.
Or in simple words, which of these 2 is a better query
Say we have 2 tables Employee and Department, and I want employees all employee + dept detail where employee salary is grater than 50000
Query 1:
select e.name, d.name from employee e, department d where e.dept_id=d.id and e.salary>50000;
Query 2:
select e.name, d.name from (select * from employee where salary>50000) e, department d where e.dept_id=d.id;
Generally it will filter as much as possible first. From the explain plan, you can actually see where the filtering is done, and where the joining is done, for example, create some tables and data:
create table employees (id integer, dept_id integer, salary number);
create table dept (id integer, dept_name varchar2(10));
insert into dept values (1, 'IT');
insert into dept values (2, 'HR');
insert into employees
select level, mod(level, 2) + 1, level * 1000
from dual connect by level <= 100;
create index employee_uk1 on employees (id);
create index dept_uk1 on dept (id);
exec dbms_stats.gather_table_stats(user, 'DEPT');
Now if I explain both the queries you provided, you will find that Oracle transforms each query into the same plan behind the scenes (it doesn't always execute what you think it does - Oracle has license to 'rewrite' the query, and it does it a lot):
explain plan for
select e.*, d.*
from employees e, dept d
where e.dept_id = d.id
and e.salary > 5000;
select * from table(dbms_xplan.display());
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 96 | 1536 | 6 (17)| 00:00:01 |
| 1 | MERGE JOIN | | 96 | 1536 | 6 (17)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| DEPT | 2 | 12 | 2 (0)| 00:00:01 |
| 3 | INDEX FULL SCAN | DEPT_UK1 | 2 | | 1 (0)| 00:00:01 |
|* 4 | SORT JOIN | | 96 | 960 | 4 (25)| 00:00:01 |
|* 5 | TABLE ACCESS FULL | EMPLOYEES | 96 | 960 | 3 (0)| 00:00:01 |
4 - access("E"."DEPT_ID"="D"."ID")
filter("E"."DEPT_ID"="D"."ID")
5 - filter("E"."SALARY">5000)
Notice the filter operations applied to the query. Now explain the alternative query:
explain plan for
select e.*, d.*
from (select * from employees where salary > 5000) e, dept d
where e.dept_id = d.id;
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 96 | 1536 | 6 (17)| 00:00:01 |
| 1 | MERGE JOIN | | 96 | 1536 | 6 (17)| 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| DEPT | 2 | 12 | 2 (0)| 00:00:01 |
| 3 | INDEX FULL SCAN | DEPT_UK1 | 2 | | 1 (0)| 00:00:01 |
|* 4 | SORT JOIN | | 96 | 960 | 4 (25)| 00:00:01 |
|* 5 | TABLE ACCESS FULL | EMPLOYEES | 96 | 960 | 3 (0)| 00:00:01 |
4 - access("EMPLOYEES"."DEPT_ID"="D"."ID")
filter("EMPLOYEES"."DEPT_ID"="D"."ID")
5 - filter("SALARY">5000)
Once you learn how to get the explain plans and how to read them, you can generally work out what Oracle is doing as it executes your query.

How do I utilize Row_Number() (partitioning) for my datapool correctly

we have following table (output is already ordered and separated for understanding):
| PK | FK1 | FK2 | ActionCode | CreationTS | SomeAttributeValue |
+----+-----+-----+--------------+---------------------+--------------------+
| 6 | 100 | 500 | Create | 2011-01-02 00:00:00 | H |
----------------------------------------------------------------------------
| 3 | 100 | 500 | Change | 2011-01-01 02:00:00 | Z |
| 2 | 100 | 500 | Change | 2011-01-01 01:00:00 | X |
| 1 | 100 | 500 | Create | 2011-01-01 00:00:00 | Y |
----------------------------------------------------------------------------
| 4 | 100 | 510 | Create | 2011-01-01 00:30:00 | T |
----------------------------------------------------------------------------
| 5 | 100 | 520 | CreateSystem | 2011-01-01 00:30:00 | A |
----------------------------------------------------------------------------
what is ActionCode? we use this in c# and there it represents an enum-value
what do i want to achieve?
well, i need the following output:
| FK1 | FK2 | ActionCode | SomeAttributeValue |
+-----+-----+--------------+--------------------+
| 100 | 500 | Create | H |
| 100 | 500 | Create | Z |
| 100 | 510 | Create | T |
| 100 | 520 | CreateSystem | A |
-------------------------------------------------
well, what is the actual logic?
we have some logical groups for composite-key (FK1 + FK2). each of these groups can be broken into partitions, which begin with Create or CreateSystem. each partition ends with Create, CreateSystem or Change. The actual value of SomeAttributeValue for each partition should be the value from the last line of the partition.
it is not possible to have following datapool:
| PK | FK1 | FK2 | ActionCode | CreationTS | SomeAttributeValue |
+----+-----+-----+--------------+---------------------+--------------------+
| 7 | 100 | 500 | Change | 2011-01-02 02:00:00 | Z |
| 6 | 100 | 500 | Create | 2011-01-02 00:00:00 | H |
| 2 | 100 | 500 | Change | 2011-01-01 01:00:00 | X |
| 1 | 100 | 500 | Create | 2011-01-01 00:00:00 | Y |
----------------------------------------------------------------------------
and then expect PK 7 to affect PK 2 or PK 6 to affect PK 1.
i don't even know how/where to start ... how can i achieve this?
we are running on mssql 2005+
EDIT:
there's a dump available:
instanceId: my PK
tenantId: FK 1
campaignId: FK 2
callId: FK 3
refillCounter: FK 4
ticketType: ActionCode (1 & 4 & 6 are Create, 5 is Change, 3 must be ignored)
ticketType, profileId, contactPersonId, ownerId, handlingStartTime, handlingEndTime, memo, callWasPreselected, creatorId, creationTS, changerId, changeTS should be taken from the Create (first line in partition in groups)
callingState, reasonId, followUpDate, callingAttempts and callingAttemptsConsecutivelyNotReached should be taken from the last Create (which then would be a "one-line-partition-in-group" / the same as the upper one) or Change (last line in partition in groups)
I'm assuming that each partition can only contain a single Create or CreateSystem, otherwise your requirements are ill-defined. The following is untested, since I don't have a sample table, nor sample data in an easily consumed format:
;With Partitions as (
Select
t1.FK1,
t1.FK2,
t1.CreationTS as StartTS,
t2.CreationTS as EndTS
From
Table t1
left join
Table t2
on
t1.FK1 = t2.FK1 and
t1.FK2 = t2.FK2 and
t1.CreationTS < t2.CreationTS and
t2.ActionCode in ('Create','CreateSystem')
left join
Table t3
on
t1.FK1 = t3.FK1 and
t1.FK2 = t3.FK2 and
t1.CreationTS < t3.CreationTS and
t3.CreationTS < t2.CreationTS and
t3.ActionCode in ('Create','CreateSystem')
where
t1.ActionCode in ('Create','CreateSystem') and
t3.FK1 is null
), PartitionRows as (
SELECT
t1.FK1,
t1.FK2,
t1.ActionCode,
t2.SomeAttributeValue,
ROW_NUMBER() OVER (PARTITION_FRAGMENT_ID BY t1.FK1,T1.FK2,t1.StartTS ORDER BY t2.CreationTS desc) as rn
from
Partitions t1
inner join
Table t2
on
t1.FK1 = t2.FK1 and
t1.FK2 = t2.FK2 and
t1.StartTS <= t2.CreationTS and
(t2.CreationTS < t1.EndTS or t1.EndTS is null)
)
select * from PartitionRows where rn = 1
(Please note than I'm using all kinds of reserved names here)
The basic logic is: The Partitions CTE is used to define each partition in terms of the FK1, FK2, an inclusive start timestamp, and exclusive end timestamp. It does this by a triple join to the base table. the rows from t2 are selected to occur after the rows from t1, then the rows from t3 are selected to occur between the matching rows from t1 and t2. Then, in the WHERE clause, we exclude any rows from the result set where a match occurred from t3 - the result being that the row from t1 and the row from t2 represent the start of two adjacent partitions.
The second CTE then retrieves all rows from Table for each partition, but assigning a ROW_NUMBER() score within each partition, based on the CreationTS, sorted descending, with the result that ROW_NUMBER() 1 within each partition is the last row to occur.
Finally, within the select, we choose those rows that occur last within their respective partitions.
This does all assume that CreationTS values are distinct within each partition. I may be able to re-work it using PK also, if that assumption doesn't hold up.
It is solvable with a recursive CTE. Here (assuming rows within partitions are ordered by CreationTS):
WITH partitioned AS (
SELECT
*,
rn = ROW_NUMBER() OVER (PARTITION BY FK1, FK2 ORDER BY CreationTS)
FROM data
),
subgroups AS (
SELECT
PK, FK1, FK2, ActionCode, CreationTS, SomeAttributeValue, rn,
Subgroup = 1,
Subrank = 1
FROM partitioned
WHERE rn = 1
UNION ALL
SELECT
p.PK, p.FK1, p.FK2, p.ActionCode, p.CreationTS, p.SomeAttributeValue, p.rn,
Subgroup = s.Subgroup + CASE p.ActionCode WHEN 'Change' THEN 0 ELSE 1 END,
Subrank = CASE p.ActionCode WHEN 'Change' THEN s.Subrank ELSE 0 END + 1
FROM partitioned p
INNER JOIN subgroups s ON p.FK1 = s.FK1 AND p.FK2 = s.FK2
AND p.rn = s.rn + 1
),
finalranks AS (
SELECT
PK, FK1, FK2, ActionCode, CreationTS, SomeAttributeValue, rn,
Subgroup, Subrank,
rank = ROW_NUMBER() OVER (PARTITION BY FK1, FK2, Subgroup ORDER BY Subrank DESC)
/* or: rank = MAX(Subrank) OVER (PARTITION BY FK1, FK2, Subgroup) - Subrank + 1 */
FROM subgroups
)
SELECT PK, FK1, FK2, ActionCode, CreationTS, SomeAttributeValue
FROM finalranks
WHERE rank = 1

MySQL I want to optimize this further

So I started off with this query:
SELECT * FROM TABLE1 WHERE hash IN (SELECT id FROM temptable);
It took forever, so I ran an explain:
mysql> explain SELECT * FROM TABLE1 WHERE hash IN (SELECT id FROM temptable);
+----+--------------------+-----------------+------+---------------+------+---------+------+------------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-----------------+------+---------------+------+---------+------+------------+-------------+
| 1 | PRIMARY | TABLE1 | ALL | NULL | NULL | NULL | NULL | 2554388553 | Using where |
| 2 | DEPENDENT SUBQUERY | temptable | ALL | NULL | NULL | NULL | NULL | 1506 | Using where |
+----+--------------------+-----------------+------+---------------+------+---------+------+------------+-------------+
2 rows in set (0.01 sec)
It wasn't using an index. So, my second pass:
mysql> explain SELECT * FROM TABLE1 JOIN temptable ON TABLE1.hash=temptable.hash;
+----+-------------+-----------------+------+---------------+----------+---------+------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+------+---------------+----------+---------+------------------------+------+-------------+
| 1 | SIMPLE | temptable | ALL | hash | NULL | NULL | NULL | 1506 | |
| 1 | SIMPLE | TABLE1 | ref | hash | hash | 5 | testdb.temptable.hash | 527 | Using where |
+----+-------------+-----------------+------+---------------+----------+---------+------------------------+------+-------------+
2 rows in set (0.00 sec)
Can I do any other optimization?
You can gain some more speed by using a covering index, at the cost of extra space consumption. A covering index is one which can satisfy all requested columns in a query without performing a further lookup into the clustered index.
First of all get rid of the SELECT * and explicitly select the fields that you require. Then you can add all the fields in the SELECT clause to the right hand side of your composite index. For example, if your query will look like this:
SELECT first_name, last_name, age
FROM table1
JOIN temptable ON table1.hash = temptable.hash;
Then you can have a covering index that looks like this:
CREATE INDEX ix_index ON table1 (hash, first_name, last_name, age);

Resources