PostgreSQL - Not using indexes when querying multiple values along with CTE - database

I recently altered 'resources' table in my DB so that I can support multiple versions of resources.
'resources' Schema:
I have one index on 'id' column and another one on 'parent_res_id' column.
The query that I use to often is to get all relatives of the resources:
WITH r2(id) AS (
SELECT r2.parent_res_id
FROM resources r2
WHERE r2.id IN ( 2, 3 )
)
SELECT r.id
FROM resources r
WHERE
(
r.id IN ( 2, 3 )
OR
r.parent_res_id IN ( 2, 3 )
OR
r.id IN (SELECT r2.id FROM r2)
OR
r.parent_res_id IN (SELECT r2.id FROM r2))
;
Though when I EXPLAIN ANALYZE this query, Postgres is doing seq scan instead of using indexes in the main query, and only using it for the CTE when I only pass one parameter in IN clause.
Query Plan when only 1 param:
Seq Scan on resources r (cost=8.21..21.25 rows=114 width=4) (actual time=0.077..0.078 rows=0 loops=1)
Filter: ((id = 240) OR (parent_res_id = 240) OR (hashed SubPlan 2) OR (hashed SubPlan 3))
Rows Removed by Filter: 153
Buffers: shared hit=13 dirtied=1
CTE f2
-> Index Scan using resources_pkey on resources r2 (cost=0.14..8.16 rows=1 width=4) (actual time=0.010..0.010 rows=0 loops=1)
Index Cond: (id = 240)
Buffers: shared hit=3 dirtied=1
SubPlan 2
-> CTE Scan on r2 r2_1 (cost=0.00..0.02 rows=1 width=4) (actual time=0.011..0.011 rows=0 loops=1)
Buffers: shared hit=3 dirtied=1
SubPlan 3
-> CTE Scan on r2 r2_2 (cost=0.00..0.02 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=1)
Planning Time: 0.122 ms
Execution Time: 0.104 ms
Query Plan with two or more params:
Seq Scan on resources r (cost=11.99..25.03 rows=115 width=4) (actual time=0.097..0.119 rows=11 loops=1)
Filter: ((id = ANY ('{211,270}'::integer[])) OR (parent_res_id = ANY ('{211,270}'::integer[])) OR (hashed SubPlan 2) OR (hashed SubPlan 3))
Rows Removed by Filter: 142
Buffers: shared hit=20
CTE f2
-> Seq Scan on resources r2 (cost=0.00..11.90 rows=2 width=4) (actual time=0.024..0.033 rows=1 loops=1)
Filter: (id = ANY ('{211,270}'::integer[]))
Rows Removed by Filter: 152
Buffers: shared hit=10
SubPlan 2
-> CTE Scan on r2 r2_1 (cost=0.00..0.04 rows=2 width=4) (actual time=0.029..0.039 rows=1 loops=1)
Buffers: shared hit=10
SubPlan 3
-> CTE Scan on r2 r2_2 (cost=0.00..0.04 rows=2 width=4) (actual time=0.000..0.000 rows=1 loops=1)
Planning Time: 0.132 ms
Execution Time: 0.152 ms
What is the reason for this and could this be optimized for better performance?

Related

Postgres jsonb array join

I have a jsonb document in a table. This document has an array of cameraIds in the document. I am trying to join this data with the cameras table that is a normal table where cameraId is a column, and return unique rows from the table with the jsonb column (why I am using a group by in my query).
Any advice on how to optimize this query for performance would be greatly appreciated.
JSONB Col Example:
{
"date": {
"end": "2018-11-02T22:00:00.000Z",
"start": "2018-11-02T14:30:00.000Z"
},
"cameraIds": [100, 101],
"networkId": 5,
"filters": [],
"includeUnprocessed": true,
"reason": "some reason",
"vehicleFilter": {
"bodyInfo": "something",
"lpInfo": "something"
}
}
Query:
select ssr.id,
a.name as user_name,
ssr.start_date,
ssr.end_date,
ssr.created_at,
ssr.payload -> 'filters' as pretty_filters,
ssr.payload -> 'reason' as reason,
ssr.payload -> 'includePlates' as include_plates,
ssr.payload -> 'vehicleFilter' -> 'bodyInfo' as vbf,
ssr.payload -> 'vehicleFilter' -> 'lpInfo' as lpInfo,
array_agg(n.name) filter (where n.organization_id = ${orgId}) as network_names,
array_agg(c.name) filter (where n.organization_id = ${orgId}) as camera_names
from
ssr
cross join jsonb_array_elements(ssr.payload -> 'cameraIds') camera_id
inner join cameras as c on c.id = camera_id::int
inner join networks as n on n.id = c.network_id
inner join accounts as a on ssr.account_id = a.id
where n.organization_id = ${someId}
and ssr.created_at between ${startDate} and ${endDat}
group by 1,2,3,4,5,6,7,8,9,10
order BY ssr.created_at desc
OFFSET 0
LIMIT 25;
Your query says:
where n.organization_id = ${someId}
But then the aggregate FILTER says:
where n.organization_id = ${orgId}
... which is a contradiction. The aggregated arrays would always be empty - except where ${orgId} happens to be the same as ${someId}, but then the FILTER clause is useless noise. IOW, the query doesn't seem to make sense as given.
The query might make sense after dropping the aggregate FILTER clauses:
SELECT s.id
, a.name AS user_name
, s.start_date
, s.end_date
, s.created_at
, s.payload ->> 'filters' AS pretty_filters
, s.payload ->> 'reason' AS reason
, s.payload ->> 'includePlates' AS include_plates
, s.payload -> 'vehicleFilter' ->> 'bodyInfo' AS vbf
, s.payload -> 'vehicleFilter' ->> 'lpInfo' AS lpInfo
, cn.camera_names
, cn.network_names
FROM ssr s
JOIN accounts a ON a.id = s.account_id -- assuming referential integrity
CROSS JOIN LATERAL (
SELECT array_agg(c.name) AS camera_names -- sort order?
, array_agg(n.name) AS network_names -- same order? distinct?
FROM jsonb_array_elements_text(ssr.payload -> 'cameraIds') i(camera_id)
JOIN cameras c ON c.id = i.camera_id::int
JOIN networks n ON n.id = c.network_id
WHERE n.organization_id = ${orgId}
) cn
WHERE s.created_at BETWEEN ${startDate} AND ${endDate} -- ?
ORDER BY s.created_at DESC NULLS LAST
LIMIT 25;
Key is the LATERAL subquery, which avoids duplication of rows from ssr, so we can also drop the outer GROUP BY. Should be considerably faster.
Also note ->> instead of -> and jsonb_array_elements_text(). See:
How to turn JSON array into Postgres array?
I left some question marks at more dubious spots in the query. Notably, BETWEEN is almost always the wrong tool for timestamps. See:
Subtract hours from the now() function

Hierarchical parent query in postgres

I am moving from Oracle to Postgresql. I am trying to convert some Oracle hierarchical queries to Postgres. For example, in Oracle to return a comma-delimited ordered list of all ids under (i.e., the children) and including the id_to_start_with I would do the following:
SELECT LISTAGG(id_something, ',') WITHIN GROUP (ORDER BY id_something) AS somethings FROM(
SELECT DISTINCT D.id_something
FROM something_table D
START WITH D.id_something = :id_to_start_with
CONNECT BY D.id_something_parent = PRIOR D.id_something
)
The equivalent in Postgres would seem to be:
WITH RECURSIVE the_somethings(id_something) AS (
SELECT id_something
FROM something_table
WHERE id_something = $id_to_start_with
UNION ALL
SELECT D.id_something
FROM the_somethings DR
JOIN something_table D ON DR.id_something = D.id_something_parent
)
SELECT string_agg(temp_somethings.id_something::TEXT, ',') AS somethings
FROM (
SELECT id_something
FROM the_somethings
ORDER BY id_something
) AS temp_somethings
Likewise if I want to return a comma-delimited ordered list of all ids above (i.e., the parents) and including the id_to_start_with I would do the following in Oracle:
SELECT LISTAGG(id_something, ',') WITHIN GROUP (ORDER BY id_something) AS somethings FROM(
SELECT DISTINCT D.id_something
FROM something_table D
START WITH D.id_something = :id_to_start_with
CONNECT BY D.id_something = PRIOR D.id_something_parent
)
The equivalent in Postgres would seem to be:
WITH RECURSIVE the_somethings(id_something, path) AS (
SELECT id_something
, id_something::TEXT as path
FROM something_table
WHERE id_something_parent IS NULL
UNION ALL
SELECT D.id_something
, (DR.path || ',' || D.id_something::TEXT) as path
FROM something_table D
JOIN the_somethings DR ON DR.id_something = D.id_something_parent
)
SELECT path
FROM the_somethings
WHERE id_something = $id_to_start_with
ORDER BY id_something
My question has to do with the last Postgres query. It seems terribly inefficient to me and I wonder if there is a better way to write it. That is, in Oracle the query will look for the parent of the id_to_start_with, then the parent of the parent, and so forth to the root.
The Postgres query, on the other hand, gets every single root to child path combination possible and then throws everything away except for the one root to id_to_start_with that I am looking for. That is potentially a ton of data to create just to throw it all away except for the one row I am looking for.
Is there a way to get a comma-delimited ordered list of all the parents of a particular id_to_start_with that is as performant in Postgres as it is in Oracle?
Edit: Adding explain plans from Oracle and Postgres.
Oracle Explain Plan Output
Postgres Explain Analyyze Output
CTE Scan on the_somethings (cost=62.27..74.66 rows=3 width=76) (actual time=0.361..0.572 rows=1 loops=1)
Filter: (id_something = 1047)
Rows Removed by Filter: 82
CTE the_somethings
-> Recursive Union (cost=0.00..62.27 rows=551 width=76) (actual time=0.026..0.433 rows=83 loops=1)
-> Seq Scan on something_table (cost=0.00..2.83 rows=1 width=8) (actual time=0.023..0.034 rows=1 loops=1)
Filter: (id_something_parent IS NULL)
Rows Removed by Filter: 82
-> Hash Join (cost=0.33..4.84 rows=55 width=76) (actual time=0.028..0.065 rows=16 loops=5)
Hash Cond: (d.id_something_parent = dr.id_something)
-> Seq Scan on something_table d (cost=0.00..2.83 rows=83 width=16) (actual time=0.002..0.012 rows=83 loops=5)
-> Hash (cost=0.20..0.20 rows=10 width=76) (actual time=0.009..0.009 rows=17 loops=5)
Buckets: 1024 Batches: 1 Memory Usage: 10kB
-> WorkTable Scan on the_somethings dr (cost=0.00..0.20 rows=10 width=76) (actual time=0.001..0.004 rows=17 loops=5)
Planning time: 0.407 ms
Execution time: 0.652 ms
This is the final query based on Jakub's answer below.
WITH RECURSIVE the_somethings(id_something, path, level, orig_id, id_something_parent) AS (
SELECT id_something
, id_something::TEXT as path
, 0 as level
, id_something AS orig_id
, id_something_parent
FROM something_table

WHERE id_something IN (1047, 448)
UNION ALL

SELECT D.id_something
, (D.id_something::TEXT || ',' || DR.path) as path

, DR.level + 1 as level

, DR.orig_id as orig_id
, D.id_something_parent

FROM something_table D
JOIN the_somethings DR ON D.id_something = DR.id_something_parent
)

SELECT DISTINCT ON(orig_id) orig_id, path
FROM the_somethings
ORDER BY orig_id, level DESC
;
CTEs in PostgreSQL are fenced meaning they will be materialized and only then will the filter from outer query will be applied. To make the query perform correctly build it the other way around and put the filter inside the CTE.
WITH RECURSIVE the_somethings(id_something, path) AS (
SELECT id_something
, id_something::TEXT as path, 0 as level, id_something AS orig_id
FROM something_table
WHERE id_something IN ($id_to_start_with,$id_to_start_with2)
UNION ALL
SELECT D.id_something
, (D.id_something::TEXT || ',' || DR.path) as path, DR.level + 1, DR.orig_id
FROM something_table D
JOIN the_somethings DR ON DR.id_something_parent = D.id_something
)
SELECT DISTINCT ON(orig_id) orig_id, path
FROM the_somethings
ORDER BY orig_id, DR.level DESC

F# SqlDataConnection Type Provider query optimization

I need some help with the optimization of a query with the F# SqlDataConnection Type Provider.
There is a table Items with the relations to 4 other tables:
Type - n : 1
SubItem1 - m : n
SubItem2 - m : n
SubItem3 - 1 : n
This is the Type Provider query code:
query {
for x in db.Items do
join t in db.ItemType on (x.Typy = t.Name)
select (x, t, x.ItemSubItem1, x.ItemSubItem2, x.SubItem3)
}
|> Seq.map toItem
It produces the following SQL statement:
SELECT [t0].[Id], [t0.Name], [t1].[Name], [t2].[ItemId], [t2].[SubItem1Id], (
SELECT COUNT(*)
FROM [dbo].[ItemSubItem1] AS [t3]
WHERE [t3].[ItemId] = [t0].[Id]
) AS [value]
FROM [dbo].[Item] AS [t0]
INNER JOIN [dbo].[ItemType] AS [t1] ON [t0].[Typ] = [t1].[Name]
LEFT OUTER JOIN [dbo].[ItemSubItem1] AS [t2] ON [t2].[ItemId] = [t0].[Id]
ORDER BY [t0].[Id], [t1].[Name], [t2].[SubItem1Id]
The problem is that only the Type and SubItems1 are joined. So when toItem is called for each entry in Items there will be 2 extra SQL queries generated to get SubItem2 and SubItem3. This is very inefficient.
Thanks for help!

Postgresql Table Partition

I have a query regarding my query performance
I am partitioning a table on daily basis
The table creation script is mentioned below:
-- Table: myschema."auditrtailreference_2014-10-02"
-- DROP TABLE myschema."auditrtailreference_2014-10-02";
CREATE TABLE myschema."auditrtailreference_2014-10-02"
(
-- Inherited from table myschema.auditrtailreference: event smallint,
-- Inherited from table myschema.auditrtailreference: innodeid character varying(80),
-- Inherited from table myschema.auditrtailreference: innodename character varying(80),
-- Inherited from table myschema.auditrtailreference: sourceid character varying(300),
-- Inherited from table myschema.auditrtailreference: intime timestamp without time zone,
-- Inherited from table myschema.auditrtailreference: outnodeid character varying(80),
-- Inherited from table myschema.auditrtailreference: outnodename character varying(80),
-- Inherited from table myschema.auditrtailreference: destinationid character varying(300),
-- Inherited from table myschema.auditrtailreference: outtime timestamp without time zone,
-- Inherited from table myschema.auditrtailreference: bytes integer,
-- Inherited from table myschema.auditrtailreference: cdrs integer,
-- Inherited from table myschema.auditrtailreference: noofsubfilesinfile integer,
-- Inherited from table myschema.auditrtailreference: recordsequencenumberlist character varying(1000),
-- Inherited from table myschema.auditrtailreference: partial_cdrs integer,
-- Inherited from table myschema.auditrtailreference: duplicate_cdrs integer,
-- Inherited from table myschema.auditrtailreference: discarded_cdrs integer,
-- Inherited from table myschema.auditrtailreference: created_cdrs integer,
-- Inherited from table myschema.auditrtailreference: corrupted_cdrs integer,
-- Inherited from table myschema.auditrtailreference: created_files integer,
-- Inherited from table myschema.auditrtailreference: duplicate_files integer,
-- Inherited from table myschema.auditrtailreference: corrupted_files integer,
-- Inherited from table myschema.auditrtailreference: partial_files integer,
-- Inherited from table myschema.auditrtailreference: discarded_files integer,
-- Inherited from table myschema.auditrtailreference: empty_files integer,
CONSTRAINT "auditrtailreference_2014-10-02_intime_check" CHECK (intime >= '2014-10-02 00:00:00'::timestamp without time zone AND intime < '2014-10-03 00:00:00'::timestamp without time zone OR intime IS NULL),
CONSTRAINT "auditrtailreference_2014-10-02_outtime_check" CHECK (outtime >= '2014-10-02 00:00:00'::timestamp without time zone AND outtime < '2014-10-03 00:00:00'::timestamp without time zone OR outtime IS NULL)
)
INHERITS (myschema.auditrtailreference)
WITH (
OIDS=FALSE
);
ALTER TABLE myschema."auditrtailreference_2014-10-02"
OWNER TO erix;
-- Index: myschema."auditrtailreference_2014-10-02_dest_indx1"
-- DROP INDEX myschema."auditrtailreference_2014-10-02_dest_indx1";
CREATE INDEX "auditrtailreference_2014-10-02_dest_indx1"
ON myschema."auditrtailreference_2014-10-02"
USING btree
(destinationid COLLATE pg_catalog."default" );
-- Index: myschema."auditrtailreference_2014-10-02_in_indx1"
-- DROP INDEX myschema."auditrtailreference_2014-10-02_in_indx1";
CREATE INDEX "auditrtailreference_2014-10-02_in_indx1"
ON myschema."auditrtailreference_2014-10-02"
USING btree
(intime );
-- Index: myschema."auditrtailreference_2014-10-02_out_indx1"
-- DROP INDEX myschema."auditrtailreference_2014-10-02_out_indx1";
CREATE INDEX "auditrtailreference_2014-10-02_out_indx1"
ON myschema."auditrtailreference_2014-10-02"
USING btree
(outtime );
-- Index: myschema."auditrtailreference_2014-10-02_srce_indx1"
-- DROP INDEX myschema."auditrtailreference_2014-10-02_srce_indx1";
CREATE INDEX "auditrtailreference_2014-10-02_srce_indx1"
ON myschema."auditrtailreference_2014-10-02"
USING btree
(sourceid COLLATE pg_catalog."default" );
My Query for the data fetch is as follows
select t3.destinationid as input, t1.sourceid as Raw, t1.outtime::text, t7.destinationid, t7.outtime::text as output from myschema.auditrtailreference t1
LEFT JOIN myschema.auditrtailreference t2 on t2.sourceid = t1.destinationid AND t2.event ='80' and t2.outnodename not like '%CRS%' and t2.outnodename not like '%rch' and t2.outtime between '2014/12/11' AND '2014/12/12'
LEFT JOIN myschema.auditrtailreference t3 on t3.sourceid = t2.destinationid AND t3.event ='68' and t3.outtime between '2014/12/11' AND '2014/12/12'
LEFT JOIN myschema.auditrtailreference t4 on t4.sourceid = t3.destinationid AND t4.event ='67' and t4.innodename like 'AIR%ollect%r' and t4.outtime >= t3.outtime and t4.outtime between '2014/12/11' AND '2014/12/12'
LEFT JOIN myschema.auditrtailreference t5 on t5.sourceid = t4.destinationid AND t5.event ='80' and t5.outnodename not like '%ESB%' and t5.outnodename not like '%Type' and t5.outtime between '2014/12/11' AND '2014/12/12'
LEFT JOIN myschema.auditrtailreference t6 on t6.sourceid = t5.destinationid AND t6.event ='68' and t6.outtime between '2014/12/11' AND '2014/12/12'
LEFT JOIN myschema.auditrtailreference t7 on (t7.destinationid = t6.destinationid || '.gz' OR t7.destinationid = t6.destinationid OR t7.destinationid = t6.destinationid || '.csv') AND t7.event ='68' AND (t7.outnodename like '%AIR%_distributer' or t7.outnodename like '%AIR%_Arch' or t7.outnodename like '%AIR%_Distributer' or t7.outnodename like '%AIR%_Distributor' or t7.outnodename like '%AIR%_distributor') and t7.outtime between '2014/12/11' AND '2014/12/12'
where t1.event ='67' and t1.innodename like 'AIR%FTP' and t1.sourceid not like '%my%' and t1.intime >= '2014/12/11 00:00:00' and t1.intime <= '2014/12/11 23:59:59' AND t3.destinationid like '%';
I am facing the issue with the table performance
Could someone please help me with this
Thanks a ton in Advance
Explain Analyze for the above query is as follows
Merge Right Join (cost=2230028197.24..10806551039995472.00 rows=432260982728698432 width=136) (actual time=597187.343..1059019.252 rows=3400 loops=1)
Merge Cond: ((t5.sourceid)::text = (t4.destinationid)::text)
-> Nested Loop Left Join (cost=0.03..24192368907.34 rows=14433968717 width=86) (actual time=679.865..1054884.883 rows=353487 loops=1)
Join Filter: (((t7.destinationid)::text = ((t6.destinationid)::text || '.gz'::text)) OR ((t7.destinationid)::text = (t6.destinationid)::text) OR ((t7.destinationid)::text = ((t6.destinationid)::text || '.csv'::text)))
-> Nested Loop Left Join (cost=0.03..15009299.32 rows=497219528 width=78) (actual time=184.474..13354.447 rows=353487 loops=1)
Join Filter: ((t6.sourceid)::text = (t5.destinationid)::text)
-> Merge Append (cost=0.03..7883009.19 rows=311384 width=78) (actual time=184.415..1877.546 rows=353487 loops=1)
Sort Key: t5.sourceid
-> Sort (cost=0.01..0.02 rows=1 width=336) (actual time=0.030..0.030 rows=0 loops=1)
Sort Key: t5.sourceid
Sort Method: quicksort Memory: 25kB
-> Seq Scan on auditrtailreference t5 (cost=0.00..0.00 rows=1 width=336) (actual time=0.001..0.001 rows=0 loops=1)
Filter: (((outnodename)::text !~~ '%ESB%'::text) AND ((outnodename)::text !~~ '%Type'::text) AND (outtime >= '2014-12-11 00:00:00'::timestamp without time zone) AND (outtime <= '2014-12-12 00:00:00'::timestamp without time zone) AND (event = 80::smallint))
-> Index Scan using "auditrtailreference_2014-12-11_srce_indx1" on "auditrtailreference_2014-12-11" t5 (cost=0.00..3842492.30 rows=311382 width=78) (actual time=99.449..896.976 rows=353478 loops=1)
Filter: (((outnodename)::text !~~ '%ESB%'::text) AND ((outnodename)::text !~~ '%Type'::text) AND (outtime >= '2014-12-11 00:00:00'::timestamp without time zone) AND (outtime <= '2014-12-12 00:00:00'::timestamp without time zone) AND (event = 80::smallint))
-> Index Scan using "auditrtailreference_2014-12-12_srce_indx1" on "auditrtailreference_2014-12-12" t5 (cost=0.00..4034803.06 rows=1 width=78) (actual time=84.927..699.589 rows=9 loops=1)
Filter: (((outnodename)::text !~~ '%ESB%'::text) AND ((outnodename)::text !~~ '%Type'::text) AND (outtime >= '2014-12-11 00:00:00'::timestamp without time zone) AND (outtime <= '2014-12-12 00:00:00'::timestamp without time zone) AND (event = 80::smallint))
-> Append (cost=0.00..22.85 rows=3 width=164) (actual time=0.006..0.025 rows=12 loops=353487)
-> Seq Scan on auditrtailreference t6 (cost=0.00..0.00 rows=1 width=336) (actual time=0.000..0.000 rows=0 loops=353487)
Filter: ((outtime >= '2014-12-11 00:00:00'::timestamp without time zone) AND (outtime <= '2014-12-12 00:00:00'::timestamp without time zone) AND (event = 68::smallint))
-> Index Scan using "auditrtailreference_2014-12-11_srce_indx1" on "auditrtailreference_2014-12-11" t6 (cost=0.00..15.15 rows=1 width=78) (actual time=0.004..0.005 rows=1 loops=353487)
Index Cond: ((sourceid)::text = (t5.destinationid)::text)
Filter: ((outtime >= '2014-12-11 00:00:00'::timestamp without time zone) AND (outtime <= '2014-12-12 00:00:00'::timestamp without time zone) AND (event = 68::smallint))
-> Index Scan using "auditrtailreference_2014-12-12_out_indx1" on "auditrtailreference_2014-12-12" t6 (cost=0.00..7.70 rows=1 width=78) (actual time=0.006..0.012 rows=11 loops=353486)
Index Cond: ((outtime >= '2014-12-11 00:00:00'::timestamp without time zone) AND (outtime <= '2014-12-12 00:00:00'::timestamp without time zone))
Filter: (event = 68::smallint)
-> Materialize (cost=0.00..60063.89 rows=1945 width=50) (actual time=0.000..0.982 rows=3400 loops=353487)
-> Append (cost=0.00..60054.16 rows=1945 width=50) (actual time=0.570..489.242 rows=3400 loops=1)
-> Seq Scan on auditrtailreference t7 (cost=0.00..0.00 rows=1 width=176) (actual time=0.001..0.001 rows=0 loops=1)
Filter: ((outtime >= '2014-12-11 00:00:00'::timestamp without time zone) AND (outtime <= '2014-12-12 00:00:00'::timestamp without time zone) AND (event = 68::smallint) AND (((outnodename)::text ~~ '%AIR%_distributer'::text) OR ((outnodename)::text ~~ '%AIR%_Arch'::text) OR ((outnodename)::text ~~ '%AIR%_Distributer'::text) OR ((outnodename)::text ~~ '%AIR%_Distributor'::text) OR ((outnodename)::text ~~ '%AIR%_distributor'::text)))
-> Seq Scan on "auditrtailreference_2014-12-11" t7 (cost=0.00..60045.68 rows=1943 width=50) (actual time=0.568..487.333 rows=3399 loops=1)
Filter: ((outtime >= '2014-12-11 00:00:00'::timestamp without time zone) AND (outtime <= '2014-12-12 00:00:00'::timestamp without time zone) AND (event = 68::smallint) AND (((outnodename)::text ~~ '%AIR%_distributer'::text) OR ((outnodename)::text ~~ '%AIR%_Arch'::text) OR ((outnodename)::text ~~ '%AIR%_Distributer'::text) OR ((outnodename)::text ~~ '%AIR%_Distributor'::text) OR ((outnodename)::text ~~ '%AIR%_distributor'::text)))
-> Index Scan using "auditrtailreference_2014-12-12_out_indx1" on "auditrtailreference_2014-12-12" t7 (cost=0.00..8.48 rows=1 width=50) (actual time=0.024..0.028 rows=1 loops=1)
Index Cond: ((outtime >= '2014-12-11 00:00:00'::timestamp without time zone) AND (outtime <= '2014-12-12 00:00:00'::timestamp without time zone))
Filter: ((event = 68::smallint) AND (((outnodename)::text ~~ '%AIR%_distributer'::text) OR ((outnodename)::text ~~ '%AIR%_Arch'::text) OR ((outnodename)::text ~~ '%AIR%_Distributer'::text) OR ((outnodename)::text ~~ '%AIR%_Distribu
tor'::text) OR ((outnodename)::text ~~ '%AIR%_distributor'::text)))
-> Materialize (cost=2230028197.20..2259975676.73 rows=5989495905 width=128) (actual time=3921.050..3924.851 rows=3400 loops=1)
-> Sort (cost=2230028197.20..2245001936.96 rows=5989495905 width=128) (actual time=3921.048..3922.459 rows=2576 loops=1)
Sort Key: t4.destinationid
Sort Method: quicksort Memory: 781kB
-> Merge Join (cost=591325.77..90441565.36 rows=5989495905 width=128) (actual time=3784.636..3918.059 rows=2576 loops=1)
Merge Cond: ((t3.sourceid)::text = (t2.destinationid)::text)
-> Sort (cost=308087.95..310241.46 rows=861404 width=120) (actual time=3464.384..3557.638 rows=150872 loops=1)
Sort Key: t3.sourceid
Sort Method: external merge Disk: 35784kB
-> Merge Left Join (cost=123584.91..170172.32 rows=861404 width=120) (actual time=2003.668..2499.298 rows=373330 loops=1)
Merge Cond: ((t3.destinationid)::text = (t4.sourceid)::text)
Join Filter: (t4.outtime >= t3.outtime)
-> Sort (cost=74053.15..74735.00 rows=272740 width=86) (actual time=1798.163..2042.079 rows=373330 loops=1)
Sort Key: t3.destinationid
Sort Method: external merge Disk: 39896kB
-> Append (cost=0.00..49428.59 rows=272740 width=86) (actual time=0.013..785.277 rows=373330 loops=1)
-> Seq Scan on auditrtailreference t3 (cost=0.00..0.00 rows=1 width=344) (actual time=0.001..0.001 rows=0 loops=1)
Filter: ((outtime >= '2014-12-11 00:00:00'::timestamp without time zone) AND (outtime <= '2014-12-12 00:00:00'::timestamp without time zone) AND ((destinationid)::text ~~ '%'::text) AND (event = 68::smallint))
-> Seq Scan on "auditrtailreference_2014-12-11" t3 (cost=0.00..49420.12 rows=272738 width=86) (actual time=0.010..570.958 rows=373319 loops=1)
Filter: ((outtime >= '2014-12-11 00:00:00'::timestamp without time zone) AND (outtime <= '2014-12-12 00:00:00'::timestamp without time zone) AND ((destinationid)::text ~~ '%'::text) AND (event = 68::smallint
))
-> Index Scan using "auditrtailreference_2014-12-12_out_indx1" on "auditrtailreference_2014-12-12" t3 (cost=0.00..8.47 rows=1 width=86) (actual time=0.021..0.030 rows=11 loops=1)
Index Cond: ((outtime >= '2014-12-11 00:00:00'::timestamp without time zone) AND (outtime <= '2014-12-12 00:00:00'::timestamp without time zone))
Filter: (((destinationid)::text ~~ '%'::text) AND (event = 68::smallint))
-> Sort (cost=49531.76..49536.49 rows=1895 width=86) (actual time=205.498..207.953 rows=7726 loops=1)
Sort Key: t4.sourceid
Sort Method: quicksort Memory: 459kB
-> Append (cost=0.00..49428.59 rows=1895 width=86) (actual time=0.251..202.613 rows=2576 loops=1)
-> Seq Scan on auditrtailreference t4 (cost=0.00..0.00 rows=1 width=344) (actual time=0.001..0.001 rows=0 loops=1)
Filter: (((innodename)::text ~~ 'AIR%ollect%r'::text) AND (outtime >= '2014-12-11 00:00:00'::timestamp without time zone) AND (outtime <= '2014-12-12 00:00:00'::timestamp without time zone) AND (event = 67::smallint))
-> Seq Scan on "auditrtailreference_2014-12-11" t4 (cost=0.00..49420.12 rows=1893 width=86) (actual time=0.247..201.086 rows=2576 loops=1)
Filter: (((innodename)::text ~~ 'AIR%ollect%r'::text) AND (outtime >= '2014-12-11 00:00:00'::timestamp without time zone) AND (outtime <= '2014-12-12 00:00:00'::timestamp without time zone) AND (event = 67::smallint))
-> Index Scan using "auditrtailreference_2014-12-12_out_indx1" on "auditrtailreference_2014-12-12" t4 (cost=0.00..8.47 rows=1 width=86) (actual time=0.022..0.022 rows=0 loops=1)
Index Cond: ((outtime >= '2014-12-11 00:00:00'::timestamp without time zone) AND (outtime <= '2014-12-12 00:00:00'::timestamp without time zone))
Filter: (((innodename)::text ~~ 'AIR%ollect%r'::text) AND (event = 67::smallint))
-> Materialize (cost=283237.82..290191.00 rows=1390636 width=86) (actual time=299.606..302.090 rows=2576 loops=1)
-> Sort (cost=283237.82..286714.41 rows=1390636 width=86) (actual time=299.543..300.435 rows=2576 loops=1)
Sort Key: t2.destinationid
Sort Method: quicksort Memory: 459kB
-> Nested Loop (cost=0.00..74796.60 rows=1390636 width=86) (actual time=0.251..296.114 rows=2576 loops=1)
Join Filter: ((t1.destinationid)::text = (t2.sourceid)::text)
-> Append (cost=0.00..52076.51 rows=940 width=86) (actual time=0.218..202.923 rows=2576 loops=1)
-> Seq Scan on auditrtailreference t1 (cost=0.00..0.00 rows=1 width=344) (actual time=0.001..0.001 rows=0 loops=1)
Filter: (((innodename)::text ~~ 'AIR%FTP'::text) AND ((sourceid)::text !~~ '%my%'::text) AND (intime >= '2014-12-11 00:00:00'::timestamp without time zone) AND (intime <= '2014-12-11 23:59:59'::timestamp without time zone) AND (event = 67::smallint))
-> Seq Scan on "auditrtailreference_2014-12-11" t1 (cost=0.00..52076.51 rows=939 width=86) (actual time=0.216..201.425 rows=2576 loops=1)
Filter: (((innodename)::text ~~ 'AIR%FTP'::text) AND ((sourceid)::text !~~ '%my%'::text) AND (intime >= '2014-12-11 00:00:00'::timestamp without time zone) AND (intime <= '2014-12-11 23:59:59'::timestamp without time zone) AND (event = 67::smallint))
-> Append (cost=0.00..24.13 rows=3 width=164) (actual time=0.007..0.030 rows=11 loops=2576)
-> Seq Scan on auditrtailreference t2 (cost=0.00..0.00 rows=1 width=336) (actual time=0.000..0.000 rows=0 loops=2576)
Filter: (((outnodename)::text !~~ '%CRS%'::text) AND ((outnodename)::text !~~ %rch'::text) AND (outtime >= '2014-12-11 00:00:00'::timestamp without time zone) AND (outtime <= '2014-12-12 00:00:00'::timestamp without time zone) AND (event = 80::smallint))
-> Index Scan using "auditrtailreference_2014-12-11_srce_indx1" on "auditrtailreference_2014-12-11" t2 (cost=0.00..15.99 rows=1 width=78) (actual time=0.005..0.006 rows=1 loops=2576)
Index Cond: ((sourceid)::text = (t1.destinationid)::text)
Filter: (((outnodename)::text !~~ '%CRS%'::text) AND ((outnodename)::text !~~ '%rch'::text) AND (outtime >= '2014-12-11 00:00:00'::timestamp without time zone) AND (outtime <= '2014-12-12 00:00:00'::timestamp without time zone) AND (event = 80::smallint))
-> Index Scan using "auditrtailreference_2014-12-12_out_indx1" on "auditrtailreference_2014-12-12" t2 (cost=0.00..8.14 rows=1 width=78) (actual time=0.004..0.016 rows=10 loops=2576)
Index Cond: ((outtime >= '2014-12-11 00:00:00'::timestamp without time zone) AND (outtime <= '2014-12-12 00:00:00'::timestamp without time zone))
Filter: (((outnodename)::text !~~ '%CRS%'::text) AND ((outnodename)::text !~~ '%rch'::text) AND (event = 80::smallint))
Total runtime: 1059027.764 ms
(90 rows)
You should replace conditions:
outtime between '2014/12/11' AND '2014/12/12'
with:
outtime >= '2014-12-11 00:00:00'::timestamp without time zone AND outtime < '2014-12-12 00:00:00'::timestamp without time zone
Key difference is that between operator is both side inclusive whereas your partitions are defined as left side inclusive and right side exclusive.
This causes two scans for each myschema.auditrtailreference reference in your query, which is visible in plan as:
-> Index Scan using "auditrtailreference_2014-12-11_srce_indx1" on "auditrtailreference_2014-12-11" t5 (cost=0.00..3842492.30 rows=311382 width=78) (actual time=99.449..896.976 rows=353478 loops=1)
Filter: (((outnodename)::text !~~ '%ESB%'::text) AND ((outnodename)::text !~~ '%Type'::text) AND (outtime >= '2014-12-11 00:00:00'::timestamp without time zone) AND (outtime <= '2014-12-12 00:00:00'::timestamp without time zone) AND (event = 80::smallint))
-> Index Scan using "auditrtailreference_2014-12-12_srce_indx1" on "auditrtailreference_2014-12-12" t5 (cost=0.00..4034803.06 rows=1 width=78) (actual time=84.927..699.589 rows=9 loops=1)
Filter: (((outnodename)::text !~~ '%ESB%'::text) AND ((outnodename)::text !~~ '%Type'::text) AND (outtime >= '2014-12-11 00:00:00'::timestamp without time zone) AND (outtime <= '2014-12-12 00:00:00'::timestamp without time zone) AND (event = 80::smallint))

sql query slow 3 leftjoins with where clausel

This first query have one and clausle more and slow down from 200ms to 9-13seconds
i dont get why its so.
If i remove all where clause i get ~200ms just if i add one more and it will be slow.
SELECT DISTINCT a.* , p.*, p2.*, p3.*
FROM article a
LEFT JOIN pro p ON a.id = p.article_id
LEFT JOIN pro p2 ON a.id = p2.article_id
LEFT JOIN pro p3 ON a.id = p3.article_id
WHERE a.is_active = true
AND p.name = 'hotel_stars'
AND p2.name = 'article_journey_days'
AND p3.name = 'article_persons'
AND p3.int_value > 0 AND p3.int_value < 7
AND p.int_value > 0 AND p.int_value < 5
Result
319 Datensätze
Laufzeit gesamt: 9,602.081 ms
SELECT DISTINCT a.* , p.*, p2.*, p3.*
FROM article a
LEFT JOIN property p ON a.id = p.article_id
LEFT JOIN property p2 ON a.id = p2.article_id
LEFT JOIN property p3 ON a.id = p3.article_id
WHERE a.is_active = true
AND p.name = 'hotel_stars'
AND p2.name = 'article_property_journey_days'
AND p3.name = 'article_property_persons'
AND p3.int_value > 0 AND p3.int_value < 7
// AND p.int_value > 0 AND p.int_value < 5 (removed)
Result
469 Datensätze
Laufzeit gesamt: 278.453 ms
Where is the Problem?
Thx
EDIT EXPLAIN PLAN:
HashAggregate (cost=24113.80..24113.81 rows=1 width=3528)
-> Nested Loop (cost=0.00..24113.69 rows=1 width=3528)
Join Filter: (a.id = p2.article_id)
-> Nested Loop (cost=0.00..16889.70 rows=1 width=2488)
-> Nested Loop (cost=0.00..16856.58 rows=4 width=2080)
Join Filter: (p.article_id = p3.article_id)
-> Seq Scan on property p (cost=0.00..8335.87 rows=115 width=1040)
Filter: ((int_value > 0) AND (int_value < 5) AND ((name)::text = 'hotel_stars'::text))
-> Materialize (cost=0.00..8336.41 rows=107 width=1040)
-> Seq Scan on property p3 (cost=0.00..8335.87 rows=107 width=1040)
Filter: ((int_value > 0) AND (int_value < 7) AND ((name)::text = 'article_property_persons'::text))
-> Index Scan using article_pkey on article a (cost=0.00..8.27 rows=1 width=408)
Index Cond: (id = p.article_id)
Filter: is_active
-> Seq Scan on property p2 (cost=0.00..7185.05 rows=3115 width=1040)
Filter: ((name)::text = 'article_property_journey_days'::text)
16 Datensätze
Laufzeit gesamt: 11.153 ms
Changing To
SELECT DISTINCT a.* , p.*, p2.*, p3.*
FROM article a
INNER JOIN pro p ON a.id = p.article_id AND p.name = 'hotel_stars' AND p.int_value > 0 AND p.int_value < 5
INNER JOIN pro p2 ON a.id = p2.article_id AND p2.name = 'article_journey_days'
INNER JOIN pro p3 ON a.id = p3.article_id AND p3.name = 'article_persons' AND p3.int_value > 0 AND p3.int_value < 7
WHERE a.is_active = true
Result:
319 Datensätze
Laufzeit gesamt: 9,315.863 ms
HashAggregate (cost=24113.80..24113.81 rows=1 width=3528)
-> Nested Loop (cost=0.00..24113.69 rows=1 width=3528)
Join Filter: (a.id = p2.article_id)
-> Nested Loop (cost=0.00..16889.70 rows=1 width=2488)
-> Nested Loop (cost=0.00..16856.58 rows=4 width=2080)
Join Filter: (p.article_id = p3.article_id)
-> Seq Scan on property p (cost=0.00..8335.87 rows=115 width=1040)
Filter: ((int_value > 0) AND (int_value < 5) AND ((name)::text = 'hotel_stars'::text))
-> Materialize (cost=0.00..8336.41 rows=107 width=1040)
-> Seq Scan on property p3 (cost=0.00..8335.87 rows=107 width=1040)
Filter: ((int_value > 0) AND (int_value < 7) AND ((name)::text = 'article_property_persons'::text))
-> Index Scan using article_pkey on article a (cost=0.00..8.27 rows=1 width=408)
Index Cond: (id = p.article_id)
Filter: is_active
-> Seq Scan on property p2 (cost=0.00..7185.05 rows=3115 width=1040)
Filter: ((name)::text = 'article_property_journey_days'::text)
16 Datensätze
Laufzeit gesamt: 4.314 ms
Similar question :(
Added Index for column p.name and p.article.
Result speed improve from 13sec to 180ms.

Resources