Query is not using Index - query-optimization

Following is my query with complete explain plan and when I taking the explain plan, its giving sequential scan on ach_batches.ach_batch_date where as the index is present on the table's mentioned column. Further, the problem is enough IO peaks observed on DB server otherwise query performance is not bad.
QUERY: (OPTIMIZATION TIMESTAMP: 01-23-2021 04:23:45) (FIRST_ROWS OPTIMIZATION)
------
SELECT
ach_batches.ach_batch_sr_no,
ach_batches.ach_origin_id,
ach_batches.odfi_routing_no,
ach_batches.ach_batch_date,
ach_batches.company_name,
ach_batches.ach_tot_dr_amount,
ach_batches.ach_tot_cr_amount,
ach_batch_details.ach_batch_rec_no,
ach_batch_details.trans_id,
ach_batch_details.ach_trans_code,
ach_batch_details.rdfi_routing_no,
ach_batch_details.dfi_account_no,
ach_batch_details.amount,
ach_batch_details.receiver_id,
ach_batch_details.receiver_name,
ach_batch_details.ach_bat_ret_code,
transaction_types.trans_type_abrv,
b_settle_statuses.name
FROM
( ( ach_batches ach_batches
INNER JOIN ach_batch_details ach_batch_details
ON
ach_batches.ach_batch_sr_no=ach_batch_details.ach_batch_sr_no )
INNER JOIN transaction_types transaction_types
ON
ach_batch_details.trans_type=transaction_types.trans_type )
INNER JOIN b_settle_statuses b_settle_statuses
ON
ach_batch_details.settle_status=b_settle_statuses.status
WHERE
(
(
ach_batches.ach_batch_date >= '01/09/2021'
)
)
AND
(
(
ach_batches.ach_batch_date <= '01/10/2021'
)
)
ORDER BY
ach_batches.ach_batch_sr_no,
ach_batch_details.ach_batch_rec_no
Estimated Cost: 46114
Estimated # of Rows Returned: 1
Temporary Files Required For: Order By
1) hfarooq.ach_batches: SEQUENTIAL SCAN
Filters: (hfarooq.ach_batches.ach_batch_date >= 01/09/2021 AND hfarooq.ach_batches.ach_batch_date <= 01/10/2021 )
2) hfarooq.ach_batch_details: INDEX PATH
(1) Index Name: informix. 122_103
Index Keys: ach_batch_sr_no ach_batch_rec_no (Serial, fragments: ALL)
Lower Index Filter: hfarooq.ach_batches.ach_batch_sr_no = hfarooq.ach_batch_details.ach_batch_sr_no
NESTED LOOP JOIN
3) hfarooq.transaction_types: INDEX PATH
(1) Index Name: mcp. 323_850
Index Keys: trans_type (Serial, fragments: ALL)
Lower Index Filter: hfarooq.ach_batch_details.trans_type = hfarooq.transaction_types.trans_type
NESTED LOOP JOIN
4) hfarooq.b_settle_statuses: INDEX PATH
(1) Index Name: mcp. 2420_8748
Index Keys: status (Serial, fragments: ALL)
Lower Index Filter: hfarooq.ach_batch_details.settle_status = hfarooq.b_settle_statuses.status
NESTED LOOP JOIN
Table's index:

Related

Assert different conditions on fields from multiple tables

Let's say I have a dimension and a fact table:
SELECT
LAST_DAY(TO_Date((f.Date),'MON-YYYY')) "MonthYear",
f.Region,
d.Formula as "Molecule",
d.product_name,
d.supplier.
SUM(f.Sales) as "Sales",
SUM(f.ext_units) as "ext_Units",
SUM(f.units) "Units",
SUM(f.ext_units) / SUM(f.units) as "Product_units"
CASE
WHEN d.Formula = 'ABC:CBA' THEN 'ABC'
WHEN d.Formula = 'DEF-FED' THEN 'DEF'
WHEN d.Formula = 'xyz;zyx' THEN 'xyz'
ELSE d.Formula
END AS "Molecule"
FROM Fact.Sales f
INNER JOIN DM_Product d on f.P-Code = d.P-Code
WHERE
f.Region = 'Mars'
AND d.Supplier = 'Simpsons'
AND d.Formula in ('ABC','DEF','GHI','JKL','BDHJK',
'FGL','MNP','RSTU', 'KCL', 'xyz', 'UWX',
'xyz;zyx', 'DEF-FED', 'ABC:CBA')
GROUP BY f.Date,
f.Region,
d.Formula,
d.product_name,
d.supplier
so, basically what I intend to do is to get all these Molecules from the Product Dimension table and find their relevant sales. but the puzzle is I have two different set of conditions:
to look into to all the two syllabuses formulas and aggregate them with equivalent one syllabi formula to find aggregate sales.
Find "Product_units" using SUM(f.ext_units) / SUM(f.units) as per my code.
(f.units) does have 0 values.
As I put for the 1st condition, I used CASE WHEN and it works perfect to find the 2 syllabuses formulas and plug them to relevant 1 syllabi formula but for the second condition that I need to know how to make sure that f.units <> 0 that is checking if SUM(f.units) <> 0 THEN SUM(f.ext_units) /SUM(f.units) ELSE 0 ?

What is the most efficient way to check for path non-existence in a non-selective query?

I have a graph model that contains three types of vertices (User, Group, Document) and two types of edges (member_of, permissions). The relationships can be expressed as:
User,Group --- member_of ---> Group (depth can be arbitrary)
Group --- permissions ---> Document (depth is 1)
I'm working to write a query that would answer "What are all of the users that have no permissions of any document?". This is a very non-selective query, as I'm not specifying an id for the User class.
I've come up with this solution:
SELECT id, name FROM User
LET $p = (
SELECT expand(outE('permissions')) FROM (
TRAVERSE out('member_of') FROM $parent.$current
)
)
WHERE $p.size() = 0
This solution appears to work, but is taking between 12-15 seconds to execute. Currently in my graph there are 10,000 Users, Groups and Documents each. There are ~10,000 permissions and ~50,000 member_of.
What is the most efficient way to check for path non-existence? Is there any way to improve the performance of my existing query or am I taking the wrong approach?
There are a few ways to improve your query. First, it isn't necessary to expand the Permissions edges, you can simply check the amount of edges stored on the query. We can also limit this check so that it stops at the first group with permissions edges, rather than checking them all (credit to Luigi D for giving me this idea). Thus the query becomes as follows.
SELECT * FROM User
LET $p = (
SELECT FROM (
TRAVERSE out('Member_Of') FROM $parent.$current
) WHERE out('Permissions').size() > 0 LIMIT 1
)
WHERE $p.size() = 0
It's hard for me to check any query improvements without a sizeable dataset, but there may be a minute improvement by using the more explicit out_Member_Of and out_Permissions properties, rather than the out(field) functions.
There might be another opportunity to slightly improve the query by 'removing' the User record from the traverse results, thus reducing the amount of records checked by the WHERE clause. This could be done via
SELECT * FROM User
LET $p = (
SELECT FROM (
TRAVERSE out('Member_Of') FROM (SELECT out('Member_Of') FROM $parent.$parent.$current)
) WHERE out('Permissions').size() > 0 LIMIT 1
)
WHERE $p.size() = 0
The previous query can also be rearranged, although I suspect this one will be slower due to it checking all of the traversed results, rather than stopping at the first. It's just another option for you try.
SELECT * FROM User
LET $p = (TRAVERSE out('Member_Of') FROM (SELECT out('Member_Of') FROM $parent.$current))
WHERE $p.out('Permissions').size() = 0
Now I'm going to diverge away from that query. Perhaps it will be quicker to pre-compute if a group has access to docs, and then check each users group with the precomputed ones. This may save a lot of repetitive traversal.
I think the best way is to get all the Groups without docs. This way all groups with docs can be eliminated before traversing their other groups.
SELECT * FROM (SELECT FROM Group WHERE out('Permissions').size() = 0)
LET $p = (
SELECT FROM (
TRAVERSE out('Member_Of') FROM $parent.$current
) WHERE out('Permissions').size() > 0 LIMIT 1
)
WHERE $p.size() = 0
Perhaps creating and using an index will make the previous query even more performant, although the process currently seems a bit janky. Before you can create an index on out_Permissions, you need to create the property with create property Group.out_Permissions LINKBAG, and then you can create the index with CREATE INDEX hasDocument ON Groups (out_Permissions, #rid) notunique METADATA {ignoreNullValues: false} (creating the index this way seems strange, but it was the only way I could get it to work, hence my janky comment). You can then query the index with select expand(rid) from index:hasDocument where key = null, which will return all the Groups without permission edges, and that would replace SELECT FROM Group WHERE out('Permissions').size() = 0 in the previous query.
So here is the query that gets the groups with docs, and checks the users against it. It correctly returns users without groups too.
SELECT expand($users)
LET $groups_without_docs = (
SELECT FROM (SELECT FROM Group WHERE out('Permissions').size() = 0)
LET $p = (
SELECT FROM (
TRAVERSE out('Member_Of') FROM $parent.$current
) WHERE out('Permissions').size() > 0 LIMIT 1
)
WHERE $p.size() = 0
),
$users = (
SELECT FROM User
LET $groups = (SELECT expand(out('Member_Of')) FROM $current)
WHERE $groups containsall (#rid in $parent.$groups_without_docs)
)
Note I think $users = (SELECT FROM User WHERE out('Member_Of') containsall (#rid in $parent.$groups_without_docs)) should work, but it doesn't. I think this may be related to a bug I've previously posted, see https://github.com/orientechnologies/orientdb/issues/4692.
I am very interested to know if the various queries above improve your query, so please comment back.
As you said, this is a very non-selective query, so it's hard to optimize.
Have you tried to add a LIMIT to the inner query?
SELECT id, name FROM User
LET $p = (
SELECT expand(outE('permissions')) FROM (
TRAVERSE out('member_of') FROM $parent.$current
) LIMIT 1
)
WHERE $p.size() = 0
or even
SELECT id, name FROM User
LET $p = (
SELECT sum(outE('permissions').size()) as s FROM (
TRAVERSE out('member_of') FROM $parent.$current
)
)
WHERE $p[0].s = 0

An aggregate may not appear in the WHERE clause unless it is in a subquery

I am trying to use below query to sort out the consumer list based on
1)actual count and then
2)subcount based on values as in Sla_state =1 and result =0 ..
Query ..
select consumer as "Consumer", class_name as "Service", count(consumer) as "totalcount", avg(responsetime) as "AvgResponseTime (ms)", max(responsetime) as "Max ResponseTime (ms)" , sla_state as "sla", result as "result_state" , count(1) as "subcount"
from
DPOWER.business_transaction bt join DPOWER.mmfg_business_transaction mbt on
(bt.business_trans_id = mbt.business_trans_id) join DPOWER.transaction_class tc on (bt.class_id = tc.class_id) and sla_state = 1 and result=0
where
(bt.starttime >= '20150701000000000000' and bt.endtime <= '20150801000000000000') group by consumer, sla_state, result, class_name order by consumer
The above query worked ..but I am able to get only the subcount and not the total count of the consumers. Below is the three table structures. Can anyone figure out how to get the total count .( i tried all possible way like count (*) etc but that didnt work out..also if I use aliases I get "multipart idenfier not bound" error.
Could you mean this in your WHERE clause ??
where
consumer in
(
select
consumer
from
DPOWER.business_transaction
where
sla_state = 1
and result=0
)
and (bt.starttime >= '20150701000000000000'
and bt.endtime <= '20150801000000000000')
To get you started, here's an idea:
select
bt1.consumer as "Consumer",
count(bt1.*) as totalcount,
count(bt2.consumer) as subcount
from
DPOWER.business_transaction bt1
left outer join DPOWER.business_transaction bt2
on bt1.consumer = bt2.consumer
and bt2.sla_state = 1
and result=0
group by
bt1.consumer
order by consumer

How to avoid repeating a field value in access query result

I have the below tables in my DB.
Orders:
OrderNo.........ItemNo........OrderQty
1000________10_________10
2000________20_________10
VendorPO:
OrderNo.........ItemNo........VendorPO........POQty
1000________10_________100__________5
2000________20_________100__________7
2000________20_________200__________3
And I used this Query:
SELECT Order.OrderNo, Order.ItemNo, Order.OrderQty, VendorPO.VendorPO, VendorPO.POQty
FROM [Order]
INNER JOIN VendorPO ON (Order.ItemNo = VendorPO.ItemNo)
AND (Order.OrderNo = VendorPO.OrderNo);
With these results:
Query Result
OrderNo.........ItemNo........OrderQty........VendorPO........POQty
1000________10__________10_________100__________5
2000________20__________10_________100__________7
2000________20__________10_________200__________3
I want to avoid repetition of the quantity in the query result, where Item quantity 10 is repeated twice against two PO reference with POQty 3 and 7.
I appreciate your support on finding a way to avoid order quantity repetition.

SQL Server 2008 Stored Proc Performance where Column = NULL

When I execute a certain stored procedure (which selects from a non-indexed view) with a non-null parameter, it's lightning fast at about 10ms. When I execute it with a NULL parameter (resulting in a FKColumn = NULL query) it's much slower at about 1200ms.
I've executed it with the actual execution plan and it appears the most costly portion of the query is a clustered index scan with the predicate IS NULL on the fk column in question - 59%! The index covering this column is (AFAIK) good.
So what can I do to improve the performance here? Change the fk column to NOT NULL and fill the nulls with a default value?
SELECT top 20 dbo.vwStreamItems.ItemId
,dbo.vwStreamItems.ItemType
,dbo.vwStreamItems.AuthorId
,dbo.vwStreamItems.AuthorPreviewImageURL
,dbo.vwStreamItems.AuthorThumbImageURL
,dbo.vwStreamItems.AuthorName
,dbo.vwStreamItems.AuthorLocation
,dbo.vwStreamItems.ItemText
,dbo.vwStreamItems.ItemLat
,dbo.vwStreamItems.ItemLng
,dbo.vwStreamItems.CommentCount
,dbo.vwStreamItems.PhotoCount
,dbo.vwStreamItems.VideoCount
,dbo.vwStreamItems.CreateDate
,dbo.vwStreamItems.Language
,dbo.vwStreamItems.ProfileIsFriendsOnly
,dbo.vwStreamItems.IsActive
,dbo.vwStreamItems.LocationIsFriendsOnly
,dbo.vwStreamItems.IsFriendsOnly
,dbo.vwStreamItems.IsDeleted
,dbo.vwStreamItems.StreamId
,dbo.vwStreamItems.StreamName
,dbo.vwStreamItems.StreamOwnerId
,dbo.vwStreamItems.StreamIsDeleted
,dbo.vwStreamItems.RecipientId
,dbo.vwStreamItems.RecipientName
,dbo.vwStreamItems.StreamIsPrivate
,dbo.GetUserIsFriend(#RequestingUserId, vwStreamItems.AuthorId) as IsFriend
,dbo.GetObjectIsBookmarked(#RequestingUserId, vwStreamItems.ItemId) as IsBookmarked
from dbo.vwStreamItems WITH (NOLOCK)
where 1 = 1
and vwStreamItems.IsActive = 1
and vwStreamItems.IsDeleted = 0
and vwStreamItems.StreamIsDeleted = 0
and (
StreamId is NULL
or
ItemType = 'Stream'
)
order by CreateDate desc
When it's not null, do you have
and vwStreamItems.StreamIsDeleted = 0
and (
StreamId = 'xxx'
or
ItemType = 'Stream'
)
or
and vwStreamItems.StreamIsDeleted = 0
and (
StreamId = 'xxx'
)
You have an OR clause there which is most likely the problem, not the IS NULL as such.
The plans will show why: the OR forces a SCAN but it's manageable with StreamId = 'xxx'. When you use IS NULL, you lose selectivity.
I'd suggest changing your index make StreamId the right-most column.
However, a view is simply a macro that expands so the underlying query on the base tables could be complex and not easy to optimise...
The biggest performance gain would be for you to try to loose GetUserIsFriend and GetObjectIsBookmarked functions and use JOIN to make the same functionality. Using functions or stored procedures inside a query is basically the same as using FOR loop - the items are called 1 by 1 to determine the value of a function. If you'd use joining tables instead, all of the items values would be determined together as a group in 1 pass.

Resources