Does SELECT DISTINCT imply Seq Scan? - database

I would like to know if performing a SELECT DISTINCT query implies a sequential scan, and how I can optimize it.
I created a dummy table and confirmed that when there is no index, SELECT DISTINCT does a Seq Scan.
test=# create table test2 (id SERIAL, t1 text);
CREATE TABLE
test=# insert into test2 select generate_series(0, 100000) AS id, md5(random()::text) AS t1;
INSERT 0 100001
test=# explain analyze select distinct t1 from test2;
Results in:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=2157.97..2159.97 rows=200 width=32) (actual time=54.086..77.352 rows=100000 loops=1)
Group Key: t1
-> Seq Scan on test2 (cost=0.00..1893.18 rows=105918 width=32) (actual time=0.012..12.232 rows=100001 loops=1)
Planning time: 0.079 ms
Execution time: 86.345 ms
(5 rows)
When we create index:
test=# create index test2_idx_t1 on test2 (t1);
CREATE INDEX
test=# explain analyze select distinct t1 from test2;
Results in:
first time:
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=2084.01..2086.01 rows=200 width=32) (actual time=48.871..74.617 rows=100000 loops=1)
Group Key: t1
-> Seq Scan on test2 (cost=0.00..1834.01 rows=100001 width=32) (actual time=0.009..9.891 rows=100001 loops=1)
Planning time: 0.145 ms
Execution time: 83.564 ms
(5 rows)
second time and onwards:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=0.42..7982.42 rows=100001 width=33) (actual time=0.016..80.949 rows=100000 loops=1)
-> Index Only Scan using test2_idx_t1 on test2 (cost=0.42..7732.42 rows=100001 width=33) (actual time=0.015..53.396 rows=100001 loops=1)
Heap Fetches: 100001
Planning time: 0.053 ms
Execution time: 87.552 ms
(5 rows)
Why is it doing a Seq Scan when queried the first time after the index was created?
Why is the index scan more expensive than the seq scan in this case, and why is the query planner choosing it?

To get the result of a query that is about all the rows in a table, the whole table has to be scanned.
The only way you can avoid a sequential table scan is to have an index on t1 and have a recently vacuumed table so that most blocks are "all visible". Then an "index only scan" can be used, which is usually cheaper.
Why is the index only scan not used right away? I cannot answer that with absolute certainty, but a good guess is that autovacuum was still busy on the table when you ran the query the first time.

Related

How to index tables for full outer join query based on date column in PostgreSQL 9.4?

Lets say I have 3 tables with millions of rows.
CREATE TABLE blog (
blog_id integer NOT NULL,
blog_definition text,
create_date timestamp without time zone,
user_id integer,
CONSTRAINT "Blog_pkey" PRIMARY KEY (blog_id)
);
CREATE TABLE blog_detail (
blog_detail_id integer NOT NULL,
blog_id integer,
blog_header text,
user_id integer,
blog_content text,
create_date timestamp without time zone,
CONSTRAINT "Blog_Detail_pkey" PRIMARY KEY (blog_detail_id)
);
CREATE TABLE users (
user_id integer NOT NULL,
country text,
user_name text,
CONSTRAINT "User_pkey" PRIMARY KEY (user_id)
);
CREATE INDEX blog_create_date_user_id_blog_definition_idx
ON blog
USING btree
(create_date, user_id, blog_definition COLLATE pg_catalog."default");
CREATE INDEX blog_detail_create_date_user_id_blog_content_blog_header_idx
ON blog_detail
USING btree
(create_date, user_id, blog_content COLLATE pg_catalog."default", blog_header COLLATE pg_catalog."default");
CREATE INDEX users_country_user_id_idx
ON users
USING btree
(country COLLATE pg_catalog."default", user_id);
And the query is like that.This query took 35 seconds with these indexes to get the results.
SELECT b.blog_definition, b.create_date, b.user_id, bd.blog_header,
bd.blog_content, bd.user_id, bd.create_date
FROM blog b
FULL OUTER JOIN blog_detail bd ON b.create_date = bd.create_date
WHERE CASE
WHEN b.blog_id IS NULL THEN
bd.user_id IN (SELECT user_id FROM users WHERE country = 'Greece')
WHEN bd.blog_id IS NULL THEN
b.user_id IN (SELECT user_Id FROM users WHERE country = 'Greece')
END
ORDER BY CASE
WHEN b.blog_id IS NULL THEN bd.create_date
WHEN bd.blog_id IS NULL THEN b.create_date
ELSE b.create_date
END DESC
LIMIT 25;
Which columns in 3 tables do i need to index(and what kind of index) to get best query performance?
explain analyze results :
Limit (cost=820038.99..820039.06 rows=25 width=50) (actual time=33047.344..33047.348 rows=25 loops=1)
-> Sort (cost=820038.99..832538.93 rows=4999976 width=50) (actual time=33047.341..33047.343 rows=25 loops=1)
Sort Key: (CASE WHEN (b.blog_id IS NULL) THEN bd.create_date WHEN (bd.blog_id IS NULL) THEN b.create_date ELSE b.create_date END)
Sort Method: top-N heapsort Memory: 26kB
-> Hash Full Join (cost=191546.31..678943.27 rows=4999976 width=50) (actual time=3039.060..28832.090 rows=15000000 loops=1)
Hash Cond: (b.create_date = bd.create_date)
Filter: CASE WHEN (b.blog_id IS NULL) THEN (hashed SubPlan 1) WHEN (bd.blog_id IS NULL) THEN (hashed SubPlan 2) ELSE NULL::boolean END
-> Seq Scan on blog b (cost=0.00..173529.53 rows=9999953 width=22) (actual time=0.035..2090.918 rows=10000000 loops=1)
-> Hash (cost=91666.89..91666.89 rows=4999989 width=28) (actual time=3003.440..3003.440 rows=5000000 loops=1)
Buckets: 8192 Batches: 128 Memory Usage: 2546kB
-> Seq Scan on blog_detail bd (cost=0.00..91666.89 rows=4999989 width=28) (actual time=0.008..1130.650 rows=5000000 loops=1)
SubPlan 1
-> Index Only Scan using users_country_user_id_idx on users (cost=0.56..1496.38 rows=41361 width=4) (actual time=0.050..4.007 rows=20000 loops=1)
Index Cond: (country = 'Germany'::text)
Heap Fetches: 0
SubPlan 2
-> Index Only Scan using users_country_user_id_idx on users users_1 (cost=0.56..1496.38 rows=41361 width=4) (actual time=0.057..4.060 rows=20000 loops=1)
Index Cond: (country = 'Germany'::text)
Heap Fetches: 0
Planning time: 0.253 ms
Execution time: 33048.583 ms
Like Couling commented to your question, FULL JOINs tend to be problematic with indexes. That said, there is much to improve upon your query:
SELECT b.blog_definition, create_date, b.user_id, bd.blog_header,
bd.blog_content, bd.user_id
FROM blog b
FULL JOIN blog_detail bd USING (create_date)
WHERE EXISTS
(SELECT 1 FROM users
WHERE country = 'Greece' AND user_id = coalesce(bd.user_id, b.user_id))
ORDER BY create_date DESC
LIMIT 25;
When you do a JOIN with the USING clause (instead of ON) then only one of the matching columns is included in the select list, so no need to use aliases. The convoluted ORDER BY clause was unnecessary anyway because b.create_date and bd.create_date are equal by virtue of the join.
The CASE WHEN clause in the WHERE filter can also be avoided by using the coalesce() function and the obvious condition that either table has to have a value for blog_id and one for user_id too (otherwise your query would fail because the filter would evaluate to WHERE NULL). Since b.blog_id is the primary key of table blog it is therefore never NULL so by that same logic b.user_id could never be NULL and you could replace the coalesce() function with the column name. But that is left for you to ponder. If you look at your EXPLAIN ANALYZE you see that the very same sub-query gets evaluated twice (SubPlan 1 and SubPlan 2). This query will access table users only once. That's a proper 4ms saved! Plus another few ms because the sub-query is faster than in your code.
The create_date field is a timestamp. Joining on timestamp equality is only possible if both records were created in the same session or when the value in one of the records is retrieved from the other record, such that their values are exactly the same.
You define an index on table blog_detail, but the index will be quite large because you include two potentially large text fields. Using an index on create_date alone will be much smaller (so fewer disk reads) and faster to process.

How can I optimize this Postgresql count query?

SELECT COUNT(*)
FROM "businesses"
WHERE (businesses.postal_code_id IN
(SELECT id
FROM postal_codes
WHERE lower(city) IN ('los angeles')
AND lower(region) = 'california'))
AND (EXISTS
(SELECT *
FROM categorizations c
WHERE c.business_id=businesses.id
AND c.category_id IN (86)))
I'm have a postgres database businesses, categories, and locations. This query took 95665.9ms to execute and I'm pretty sure the bottleneck is in categorizations. Is there a better way to execute this? The resulting count was 1032
=# EXPLAIN ANALYZE SELECT COUNT(*)
-# FROM "businesses"
-# WHERE (businesses.postal_code_id IN
(# (SELECT id
(# FROM postal_codes
(# WHERE lower(city) IN ('los angeles')
(# AND lower(region) = 'california'));
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=4007.74..4007.75 rows=1 width=0) (actual time=263820.923..263820.924 rows=1 loops=1)
-> Nested Loop (cost=41.93..4005.20 rows=1015 width=0) (actual time=469.716..263679.865 rows=112513 loops=1)
-> HashAggregate (cost=15.59..15.60 rows=1 width=4) (actual time=332.664..332.946 rows=82 loops=1)
-> Bitmap Heap Scan on postal_codes (cost=11.57..15.59 rows=1 width=4) (actual time=84.772..332.407 rows=82 loops=1)
Recheck Cond: ((lower((city)::text) = 'los angeles'::text) AND (lower((region)::text) = 'california'::text))
-> BitmapAnd (cost=11.57..11.57 rows=1 width=0) (actual time=77.530..77.530 rows=0 loops=1)
-> Bitmap Index Scan on idx_postal_codes_lower_city (cost=0.00..5.66 rows=187 width=0) (actual time=22.800..22.800 rows=82 loops=1)
Index Cond: (lower((city)::text) = 'los angeles'::text)
-> Bitmap Index Scan on idx_postal_codes_lower_region (cost=0.00..5.66 rows=187 width=0) (actual time=54.714..54.714 rows=2356 loops=1)
Index Cond: (lower((region)::text) = 'california'::text)
-> Bitmap Heap Scan on businesses (cost=26.34..3976.91 rows=1015 width=4) (actual time=95.926..3208.426 rows=1372 loops=82)
Recheck Cond: (postal_code_id = postal_codes.id)
-> Bitmap Index Scan on index_businesses_on_postal_code_id (cost=0.00..26.08 rows=1015 width=0) (actual time=89.864..89.864 rows=1380 loops=82)
Index Cond: (postal_code_id = postal_codes.id)
Total runtime: 263821.016 ms
(15 rows)
And the join version:
EXPLAIN ANALYZE SELECT count(*) FROM businesses
LEFT JOIN postal_codes
ON businesses.postal_code_id = postal_codes.id
WHERE lower(postal_codes.city) = 'los angeles'
AND lower(postal_codes.region) = 'california';
-[ RECORD 1 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN | Aggregate (cost=4006.14..4006.15 rows=1 width=0) (actual time=143357.170..143357.171 rows=1 loops=1)
-[ RECORD 2 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN | -> Nested Loop (cost=37.91..4005.19 rows=381 width=0) (actual time=138.666..143218.064 rows=112514 loops=1)
-[ RECORD 3 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN | -> Bitmap Heap Scan on postal_codes (cost=11.57..15.59 rows=1 width=4) (actual time=0.559..33.957 rows=82 loops=1)
-[ RECORD 4 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN | Recheck Cond: ((lower((city)::text) = 'los angeles'::text) AND (lower((region)::text) = 'california'::text))
-[ RECORD 5 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN | -> BitmapAnd (cost=11.57..11.57 rows=1 width=0) (actual time=0.532..0.532 rows=0 loops=1)
-[ RECORD 6 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN | -> Bitmap Index Scan on idx_postal_codes_lower_city (cost=0.00..5.66 rows=187 width=0) (actual time=0.058..0.058 rows=82 loops=1)
-[ RECORD 7 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN | Index Cond: (lower((city)::text) = 'los angeles'::text)
-[ RECORD 8 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN | -> Bitmap Index Scan on idx_postal_codes_lower_region (cost=0.00..5.66 rows=187 width=0) (actual time=0.461..0.461 rows=2356 loops=1)
-[ RECORD 9 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN | Index Cond: (lower((region)::text) = 'california'::text)
-[ RECORD 10 ]--------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN | -> Bitmap Heap Scan on businesses (cost=26.34..3976.91 rows=1015 width=4) (actual time=55.493..1742.407 rows=1372 loops=82)
-[ RECORD 11 ]--------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN | Recheck Cond: (postal_code_id = postal_codes.id)
-[ RECORD 12 ]--------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN | -> Bitmap Index Scan on index_businesses_on_postal_code_id (cost=0.00..26.09 rows=1015 width=0) (actual time=53.141..53.141 rows=1381 loops=82)
-[ RECORD 13 ]--------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN | Index Cond: (postal_code_id = postal_codes.id)
-[ RECORD 14 ]--------------------------------------------------------------------------------------------------------------------------------------------------------------
QUERY PLAN | Total runtime: 143357.260 ms
The result is much bigger with the simplified query but given there are indexes and I'm doing only ONE join, I'm surprised it takes so long
Try to use a functional indexes over column city
CREATE INDEX ON postal_codes((lower(city)))
There is strong dependency between columns city and region, so sometimes you have to separate these predictions for better accuracy of planner predictions. If you need better prediction, then you need add columns lower_city and lower_region to table postal_codes - PostgreSQL has not statistics over indexes.
Send a execution plan to here - via http://explain.depesz.com/ - if is possible result EXPLAIN ANALYZE YOUR_QUERY
9.1 should to translate correlated subquery to semijoin automatically, but I am not sure. Try to rewrite your query from subqueries to INNER JOIN only form (probably doesn't help, but maybe).

Postgresql: inner join takes 70 seconds

I have two tables -
Table A : 1MM rows,
AsOfDate, Id, BId (foreign key to table B)
Table B : 50k rows,
Id, Flag, ValidFrom, ValidTo
Table A contains multiple records per day between 2011/01/01 and 2011/12/31 across 100 BId's.
Table B contains multiple non overlapping (between validfrom and validto) records for 100 Bids.
The task of the join will be to return the flag that was active for the BId on the given AsOfDate.
select
a.AsOfDate, b.Flag
from
A a inner Join B b on
a.BId = b.BId and b.ValidFrom <= a.AsOfDate and b.ValidTo >= a.AsOfDate
where
a.AsOfDate >= 20110101 and a.AsOfDate <= 20111231
This query takes ~70 seconds on a very high end server (+3Ghz) with 64Gb of memory.
I have indexes on every combination of field as I'm testing this - to no avail.
Indexes : a.AsOfDate, a.AsOfDate+a.bId, a.bid
Indexes : b.bid, b.bid+b.validfrom
Also tried the range queries suggested below (62seconds)
This same query on the free version of Sql Server running in a VM takes ~1 second to complete.
any ideas?
Postgres 9.2
Query Plan
QUERY PLAN
---------------------------------------------------------------------------------------
Aggregate (cost=8274298.83..8274298.84 rows=1 width=0)
-> Hash Join (cost=1692.25..8137039.36 rows=54903787 width=0)
Hash Cond: (a.bid = b.bid)
Join Filter: ((b.validfrom <= a.asofdate) AND (b.validto >= a.asofdate))
-> Seq Scan on "A" a (cost=0.00..37727.00 rows=986467 width=12)
Filter: ((asofdate > 20110101) AND (asofdate < 20111231))
-> Hash (cost=821.00..821.00 rows=50100 width=12)
-> Seq Scan on "B" b (cost=0.00..821.00 rows=50100 width=12)
see http://explain.depesz.com/s/1c5 for the analyze output
Consider using the range types available in postgresql 9.2:
create index on a using gist(int4range(asofdate, asofdate, '[]'));
create index on b using gist(int4range(validfrom, validto, '[]'));
You can query for a date in a matching a range like so:
select * from a
where int4range(asofdate,asofdate,'[]') && int4range(20110101, 20111231, '[]');
And for rows in b overlapping a record in a like so:
select *
from b
join a on int4range(b.validfrom,b.validto,'[]') #> a.asofdate
where a.id = 1
(&& means "overlaps", #>means "contains", and '[]' indicates to create a range that includes both end points)
The issues was with the indexes - for some reason unclear to me, the indexes on the tables were not being referenced correctly by the query analyzer - i removed them all, added them back (exactly the same - via script) and the query now takes ~303ms.
thanks for all the help on this very frustrating problem.

Why is row count 0 in my PostgreSQL plan?

I have a query which is equi-joining two tables, TableA and TableB using a nested loop. Because of the "equi"-join contraint, all rows returned in the result will therefore correspond to at least one row from each of these two tables. However, according to the plan (EXPLAIN ANALYZE) the actual rows count is 0 from TableB, even though a row is returned in the final result. How can the actual rows count equal zero here?
Here is the execution plan:
=> explain analyze select p.id, p.title, s.count from products p, stock s where p.id = s.p_id and s.w_id = 6 and p.type = 9 and s.count > 0 order by p.title;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Sort (cost=42.42..42.42 rows=2 width=36) (actual time=0.198..0.199 rows=1 loops=1)
Sort Key: p.title
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=0.00..42.41 rows=2 width=36) (actual time=0.170..0.181 rows=1 loops=1)
-> Seq Scan on products p (cost=0.00..9.25 rows=4 width=32) (actual time=0.068..0.106 rows=4 loops=1)
Filter: (type = 9)
-> Index Scan using stock_pk on stock s (cost=0.00..8.28 rows=1 width=8) (actual time=0.015..0.015 rows=0 loops=4)
Index Cond: ((w_id = 6) AND (p_id = p.id))
Filter: (count > 0)
Total runtime: 0.290 ms
And the two table definitions... The products table first:
=> \d products
Table "public.products"
Column | Type | Modifiers
--------+------------------------+-----------
id | integer | not null
title | character varying(100) |
type | integer |
price | double precision |
filler | character(500) |
Indexes:
"products_pkey" PRIMARY KEY, btree (id)
"products_type_idx" btree (type)
Referenced by:
TABLE "orderline" CONSTRAINT "orderline_p_id_fkey" FOREIGN KEY (p_id) REFERENCES products(id)
TABLE "stock" CONSTRAINT "stock_p_id_fkey" FOREIGN KEY (p_id) REFERENCES products(id)
The stock table:
=> \d stock
Table "public.stock"
Column | Type | Modifiers
--------+---------+-----------
w_id | integer | not null
p_id | integer | not null
count | integer |
Indexes:
"stock_pk" PRIMARY KEY, btree (w_id, p_id)
"stock_p_id_idx" btree (p_id)
Foreign-key constraints:
"stock_p_id_fkey" FOREIGN KEY (p_id) REFERENCES products(id)
"stock_w_id_fkey" FOREIGN KEY (w_id) REFERENCES warehouses(id)
The actual rows of the inner index scan is the average number of rows returned in each call of it.
Looking at http://www.postgresql.org/docs/current/static/using-explain.html:
In some query plans, it is possible for a subplan node to be executed more than once. For example, the inner index scan is executed once per outer row in the above nested-loop plan. In such cases, the loops value reports the total number of executions of the node, and the actual time and rows values shown are averages per-execution. This is done to make the numbers comparable with the way that the cost estimates are shown. Multiply by the loops value to get the total time actually spent in the node.
I'm not sure how it's rounded (I'm guessing down to the nearest int, after averaging), but it might be that most rows in products don't have a corresponding row in stock.

Postgres discrepency in trigger execution speed?

I have a trigger that executes a function on table insert or update. It looks like this:
CREATE OR REPLACE FUNCTION func_fk_location_area()
RETURNS "trigger" AS $$
BEGIN
IF EXISTS (
-- there was a row valid in area when location started
SELECT * FROM location
WHERE NOT EXISTS (
SELECT * FROM area
WHERE area.key=location.key
AND area.id=location.area_id
AND ( (area.tr_from<=location.tr_from AND area.tr_until>location.tr_from) OR
(area.tr_from=location.tr_from AND area.tr_until=location.tr_from)))
) OR EXISTS (
-- there was a row valid in area when location ended
SELECT * FROM location
WHERE NOT EXISTS (
SELECT * FROM area
WHERE area.key=location.key
AND area.id=location.area_id
AND ( (area.tr_from<location.tr_until AND area.tr_until>=location.tr_until) OR
(area.tr_from=location.tr_until AND area.tr_until=location.tr_until)))
)
THEN
RAISE EXCEPTION 'FK location_area integrity violation.';
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER trigger_fk_area_location AFTER DELETE OR UPDATE ON area
FOR EACH ROW EXECUTE PROCEDURE func_fk_location_area();
CREATE TRIGGER trigger_fk_location_area AFTER INSERT OR UPDATE ON location
FOR EACH ROW EXECUTE PROCEDURE func_fk_location_area();
When I insert a row, it seems to run very slowly. Using explain analyze I determined that this trigger was taking nearly 400ms to complete.
Result (cost=0.00..0.03 rows=1 width=0) (actual time=0.026..0.029 rows=1 loops=1)
Trigger for constraint location_fkey_tr_by: time=0.063 calls=1
Trigger trigger_fk_location_area: time=361.878 calls=1
Trigger trigger_update_objects_location: time=355.033 calls=1
Total runtime: 717.229 ms
(5 rows)
However, if I run the two lots of SQL in the function, they each only take 3 or 4ms to run!
FIRST PART:
mydb=# explain analyze
mydb-# SELECT * FROM location
mydb-# WHERE NOT EXISTS (
mydb(# SELECT * FROM area
mydb(# WHERE area.key=location.key
mydb(# AND area.id=location.area_id
mydb(# AND ( (area.tr_from<location.tr_until AND area.tr_until>=location.tr_until) OR
mydb(# (area.tr_from=location.tr_until AND area.tr_until=location.tr_until)));
Hash Anti Join (cost=14.68..146.84 rows=1754 width=126) (actual time=5.512..5.512 rows=0 loops=1)
Hash Cond: ((location.key = area.key) AND (location.area_id = area.id))
Join Filter: (((area.tr_from < location.tr_until) AND (area.tr_until >= location.tr_until)) OR ((area.tr_from = location.tr_until) AND (area.tr_until = locat
ion.tr_until)))
-> Seq Scan on location (cost=0.00..79.91 rows=2391 width=126) (actual time=0.005..1.016 rows=2393 loops=1)
-> Hash (cost=8.87..8.87 rows=387 width=37) (actual time=0.497..0.497 rows=387 loops=1)
-> Seq Scan on area (cost=0.00..8.87 rows=387 width=37) (actual time=0.004..0.250 rows=387 loops=1)
Total runtime: 5.562 ms
(7 rows)
SECOND PART:
mydb=# explain analyze
mydb-# SELECT * FROM location
mydb-# WHERE NOT EXISTS (
mydb(# SELECT * FROM area
mydb(# WHERE area.key=location.key
mydb(# AND area.id=location.area_id
mydb(# AND ( (area.tr_from<location.tr_until AND area.tr_until>=location.tr_until) OR
mydb(# (area.tr_from=location.tr_until AND area.tr_until=location.tr_until)));
Hash Anti Join (cost=14.68..146.84 rows=1754 width=126) (actual time=5.666..5.666 rows=0 loops=1)
Hash Cond: ((location.key = area.key) AND (location.area_id = area.id))
Join Filter: (((area.tr_from < location.tr_until) AND (area.tr_until >= location.tr_until)) OR ((area.tr_from = location.tr_until) AND (area.tr_until = locat
ion.tr_until)))
-> Seq Scan on location (cost=0.00..79.91 rows=2391 width=126) (actual time=0.005..1.072 rows=2393 loops=1)
-> Hash (cost=8.87..8.87 rows=387 width=37) (actual time=0.509..0.509 rows=387 loops=1)
-> Seq Scan on area (cost=0.00..8.87 rows=387 width=37) (actual time=0.007..0.239 rows=387 loops=1)
Total runtime: 5.725 ms
(7 rows)
This makes no sense to me.
Any thoughts?
Thanks.
You're setting up the trigger to run for each row, and then inside the trigger function you're doing another select on the whole table. Do one or the other. (Try changing FOR EACH ROW to FOR EACH STATEMENT.)
It looks like postgres may sometimes create a different plan if the query has been prepared for the function. If I change the function to actually execute the SQL then it creates a new plan every time and it does operate much faster for my particular scenario (strangely!)
This basically solves my problem:
CREATE OR REPLACE FUNCTION func_fk_location_area()
RETURNS "trigger" AS $$
DECLARE
myst TEXT;
mysv TEXT;
myrec RECORD;
BEGIN
myst := 'SELECT id FROM location WHERE NOT EXISTS (SELECT id FROM area WHERE area.key=location.key AND area.id=location.area_id ';
mysv := 'AND ((area.tr_from<=location.tr_from AND area.tr_until>location.tr_from) OR (area.tr_from=location.tr_from AND area.tr_until=location.tr_from)))';
EXECUTE myst || mysv;
IF FOUND THEN
RAISE EXCEPTION 'FK location_area integrity violation.';
RETURN NULL;
END IF;
mysv := 'AND ((area.tr_from<location.tr_until AND area.tr_until>=location.tr_until) OR (area.tr_from=location.tr_until AND area.tr_until=location.tr_until)))';
EXECUTE myst || mysv;
IF FOUND THEN
RAISE EXCEPTION 'FK location_area integrity violation.';
END IF;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER trigger_fk_area_location AFTER DELETE OR UPDATE ON area
FOR EACH ROW EXECUTE PROCEDURE func_fk_location_area();
CREATE TRIGGER trigger_fk_location_area AFTER INSERT OR UPDATE ON location
FOR EACH ROW EXECUTE PROCEDURE func_fk_location_area();

Resources