I'm using Postgis extension for Postgres and trying to optimize my query for searching points in circle.
Consider I have this table with index:
create table position
(
id bigserial not null primary key,
date timestamp with time zone,
point GEOMETRY(Point, 4326),
alias varchar(50)
);
create index position_point_idx on position using gist (point);
Now when I use query with polygon everything work as expected. In explain plan I can see that query uses index.
SELECT distinct alias
FROM position
WHERE date > '2021-11-28T19:26:18.574Z'
AND date < '2021-11-28T20:26:18.574Z'
AND ST_contains(ST_GeomFromText(
'POLYGON ((13.970947489142418 49.59174558308953, 13.970947489142418 50.12515341892287, 15.208740681409838 50.12515341892287, 15.208740681409838 49.59174558308953, 13.970947489142418 49.59174558308953))',
4326), point);
-> Bitmap Index Scan on position_point_idx (cost=0.00..183.82 rows=5254 width=0) (actual time=5.981..5.981 rows=94462 loops=1)
Okey now I want to search aliases in circle but for some reason it takes seconds and not using index at all.
SELECT distinct alias
FROM position
WHERE
date > '2021-11-28T19:26:18.574Z' AND date < '2021-11-28T20:26:18.574Z'
AND
ST_DWithin (point,ST_GeomFromText('POINT (14.32983409613371
49.91815471231952)',4326),62815.14152820495);
ST_DWithin is in list here so it should use index but it's ignoring it.
What I'm doing wrong here? Thanks for any hint.
Here is my query plan
HashAggregate (cost=687537.59..687538.59 rows=100 width=9) (actual time=2874.991..2875.003 rows=100 loops=1)
Output: alias
" Group Key: ""position"".alias"
-> Gather (cost=1000.00..686702.70 rows=333955 width=9) (actual time=0.254..2041.354 rows=5008801 loops=1)
Output: alias
Workers Planned: 2
Workers Launched: 2
" -> Parallel Seq Scan on public.""position"" (cost=0.00..652307.20 rows=139148 width=9) (actual time=0.021..2117.644 rows=1669600 loops=3)"
Output: alias
" Filter: ((""position"".date > '2021-11-28 19:26:18.574+00'::timestamp with time zone) AND (""position"".date < '2021-11-28 20:26:18.574+00'::timestamp with time zone) AND (""position"".point && '0103000020E6100000010000000500000077EC65F919AAEEC0B42AE025A7A5EEC077EC65F919AAEEC03A26ECE821B2EE4077646615AFADEE403A26ECE821B2EE4077646615AFADEE40B42AE025A7A5EEC077EC65F919AAEEC0B42AE025A7A5EEC0'::geometry) AND ('0101000020E61000000100C003E0A82C40520AF71786F54840'::geometry && st_expand(""position"".point, '62815.1415282049493'::double precision)) AND _st_dwithin(""position"".point, '0101000020E61000000100C003E0A82C40520AF71786F54840'::geometry, '62815.1415282049493'::double precision))"
Rows Removed by Filter: 86028
Worker 0: actual time=0.023..2492.854 rows=1922778 loops=1
Worker 1: actual time=0.025..2493.448 rows=2024544 loops=1
Planning Time: 0.211 ms
Execution Time: 2876.783 ms
PostgreSQL chooses a sequential scan because it thinks that that is the most efficient access strategy, and I would say it is right. After all, the WHERE condition removed only 250000 out of approximately 5 million rows.
I think you wanted to use geography, not geometry. In geometry 4326, the entire earth (and the rest of the universe, I suppose) is well within 62815.14152820495 degrees of every other point, so the index would be profoundly ineffective.
If you were using geography, that would be 39 miles, for which the index would be useful, and in my hands it would be used.
The stats on your table also seem to be way off.
Related
I have a table with the below structure and indexes:
Table "public.client_data"
Column | Type | Modifiers
-------------------------+---------+-----------
account_id | text | not_null
client_id | text | not null
client_type | text | not null
creation_time | bigint | not null
last_modified_time | bigint | not null
Indexes:
"client_data_pkey" PRIMARY KEY, btree (account_id, client_id)
"client_data_last_modified_time_index" btree (last_modified_time)
From this table I need to find the oldest record - for this I used the following query:
SELECT last_modified_time FROM client_data ORDER BY last_modified_time ASC LIMIT 1;
However this query on this table with around 61 million rows is running very slow (90-100 mins) in a db.r4.2xlarge RDS instance in AWS Aurora Postgres 9.6 with no other concurrent queries running.
However changing the query to use DESC finishes instantly. What could be the problem? I was expecting that since I have an index of the last_modified_time querying only for that column ordered by that column with the limit applied would involve an index-only query that should stop after the first entry in the index.
Here is the output of the explain analyze:
EXPLAIN ANALYZE SELECT last_modified_time FROM client_data ORDER BY last_modified_time ASC LIMIT 1;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.57..2.31 rows=1 width=8) (actual time=6297823.287..6297823.287 rows=1 loops=1)
-> Index Only Scan using client_data_last_modified_time_index on client_data (cost=0.57..1049731749.38 rows=606590292 width=8) (actual time=6297823.287..6297823.287 rows=1 loops=1)
Heap Fetches: 26575013
Planning time: 0.078 ms
Execution time: 6297823.306 ms
The same for the DESC version of the query results in the following
EXPLAIN ANALYZE SELECT last_modified_time FROM client_data ORDER BY last_modified_time DESC LIMIT 1;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.57..2.32 rows=1 width=8) (actual time=2.265..2.266 rows=1 loops=1)
-> Index Only Scan Backward using client_data_last_modified_time_index on client_data (cost=0.57..1066049674.69 rows=611336085 width=8) (actual time=2.264..2.264 rows=1 loops=1)
Heap Fetches: 9
Planning time: 0.095 ms
Execution time: 2.278 ms
Any pointers?
The difference is this:
The slow plan has
Heap Fetches: 26575013
and the fast plan
Heap Fetches: 9
Heap fetches is what turns a fast index only scan to a slow normal index scan.
Did the table experience mass updates or deletions recently?
The reason for the slow scan is that it has to wade through a lot of invisible (deleted) tuples before it hits the first match.
Run VACUUM on the table, and both scans will be fast.
I am trying to figure out the given username by frontend is a valid name or not.
I have a table that contains lot of names.
So for example I got Adam18 I need to give an answer in real time (< 500ms)
My query:
SELECT * FROM names WHERE 'Adam18' ILIKE '%' || name || '%'
The query is correct but it uses sequential scan
Explain result:
Seq Scan on names (cost=0.00..2341.20 rows=527 width=516) (actual time=1.452..24.774 rows=12 loops=1)
Filter: ('Adam18'::text ~~ (('%'::text || (name)::text) || '%'::text))
Rows Removed by Filter: 105314
Buffers: shared hit=498
Planning time: 0.062 ms
Execution time: 24.796 ms
Is there a way to create index on this case?
My current index:
CREATE INDEX names_gin_idx ON names USING gin (name gin_trgm_ops)
I cannot use this. Can you help me?
I have a daily partitioned table in postgresql. It uses cdr_date for partitioning. When I select a simple query, it takes a long time I dont know why!
this is a simple sql
EXPLAIN (ANALYZE , BUFFERS )
select * FROM cdr
WHERE cdr_date >= '2018-05-24 11:59:00.937000 +00:00'
AND cdr_date <= '2018-05-25 23:59:59.937000 +00:00'
and it result
Append (cost=0.56..1036393.46 rows=14908437 width=295) (actual time=5019.283..335535.305 rows=15191628 loops=1)
Buffers: shared hit=252735 read=1443977 written=125'
-> Index Scan using ind_cdr_cdr_date on cdr (cost=0.56..8.58 rows=1 width=286) (actual time=5019.190..5019.190 rows=0 loops=1)'
Index Cond: ((cdr_date >= ''2018-05-24 11:59:00.937+00''::timestamp with time zone) AND (cdr_date <= ''2018-05-25 23:59:59.937+00''::timestamp with time zone))
Buffers: shared hit=178464 read=708130 written=125
-> Index Scan using ind_cdr_2018_05_24 on cdr_2018_05_24 (cost=0.43..567998.02 rows=7158579 width=295) (actual time=0.091..311773.252 rows=7846816 loops=1)
Index Cond: ((cdr_date >= ''2018-05-24 11:59:00.937+00''::timestamp with time zone) AND (cdr_date <= ''2018-05-25 23:59:59.937+00''::timestamp with time zone))
Buffers: shared hit=74264 read=383715
-> Seq Scan on cdr_2018_05_25 (cost=0.00..468386.85 rows=7749857 width=295) (actual time=5.192..16189.737 rows=7344812 loops=1)
Filter: ((cdr_date >= ''2018-05-24 11:59:00.937+00''::timestamp with time zone) AND (cdr_date <= ''2018-05-25 23:59:59.937+00''::timestamp with time zone))
Buffers: shared hit=7 read=352132
Planning time: 3.394 ms
Execution time: 336984.703 ms
here is my root table
CREATE TABLE cdr
(
id BIGSERIAL NOT NULL
CONSTRAINT cdr_pkey
PRIMARY KEY,
username VARCHAR(256) NOT NULL,
user_id BIGINT,
cdr_date TIMESTAMP WITH TIME ZONE NOT NULL,
created_at TIMESTAMP WITH TIME ZONE NOT NULL,
last_reset_time TIMESTAMP WITH TIME ZONE,
prev_cdr_date TIMESTAMP WITH TIME ZONE NOT NULL
);
CREATE INDEX ind_cdr_user_id
ON cdr (user_id);
CREATE INDEX ind_cdr_cdr_date
ON cdr (cdr_date);
and here is my one of the child table
-- auto-generated definition
CREATE TABLE cdr_2018_05_25
(
CONSTRAINT cdr_2018_05_25_cdr_date_check
CHECK ((cdr_date >= '2018-05-25 00:00:00+00' :: TIMESTAMP WITH TIME ZONE) AND
(cdr_date <= '2018-05-26 00:23:29.064408+00' :: TIMESTAMP WITH TIME ZONE))
)
INHERITS (cdr);
CREATE INDEX ind_cdr_2018_05_25_user_id
ON cdr_2018_05_25 (user_id);
CREATE INDEX ind_cdr_2018_05_25
ON cdr_2018_05_25 (cdr_date);
Because your partition is big, and you're basically selecting most of the data in the partition.
The filter is not equal to the check, so after it determines which partition to use, it still scans the index.
There are 3 solutions that I can propose that can work together:
Don't partition on ranges with such a high resolution. Consider adding another field, which is just the DATE component, and have the check with an equality operator instead. This will also ensure that your partitions don't overlap like in this case. This won't help much in this exact case, unless you really want to select all the data from a single partition.
Cluster the table on the cdr_date index, which will drastically speed up such queries.
CLUSTER cdr_2018_05_24 USING ind_cdr_2018_05_24
Consider partitioning the partitions, by hour, in case you often select smaller time ranges. 7 million rows are quite a lot for such a query.
There is no way it should take 5 seconds to find 0 rows on an index scan of the root table. I would say your root table (or indexes, anyway) is massively bloated. And if that is the case, maybe your other ones are as well. Are you vacuuming these tables sufficiently, or even at all? Look in pg_stat_user_tables for the last time they were vacuumed, either manually or auto.
I need help or any hint. I have Postgres DB 9.4 and have one query processed very slow SOMETIMES.
SELECT COUNT(*) FROM "table_a" INNER JOIN "table_b" ON "table_b"."id" = "table_a"."table_b_id" AND "table_b"."deleted_at" IS NULL WHERE "table_a"."deleted_at" IS NULL AND "table_b"."company_id" = ? AND "table_a"."company_id" = ?
Query plan for this -
Aggregate (cost=308160.70..308160.71 rows=1 width=0)
-> Hash Join (cost=284954.16..308160.65 rows=20 width=0)
Hash Cond: ?
-> Bitmap Heap Scan on table_a (cost=276092.39..299260.96 rows=6035 width=4)
Recheck Cond: ?
Filter: ?
-> Bitmap Index Scan on index_table_a_on_created_at_and_company_id (cost=0.00..276090.89 rows=6751 width=0)
Index Cond: ?
-> Hash (cost=8821.52..8821.52 rows=3220 width=4)
-> Bitmap Heap Scan on table_b (cost=106.04..8821.52 rows=3220 width=4)
Recheck Cond: ?
Filter: ?
-> Bitmap Index Scan on index_ table_b_on_company_id (cost=0.00..105.23 rows=3308 width=0)
Index Cond: ?
But usually, this is query executed enough fast (about 69.7ms). I don't understand why this happened sometimes. I saw in performance logs by this period, that my RDS instance consumes a lot of memory and count this queries reaches about 100 per seconds. so guys, any helps please, where do I move for solve this problem.
I am not sure if this will solve your problem or not :)
When this query is returning very fast result it is returning result from cache and not executing query again and not preparing result at that time.
First of all you have to check if there are too much queries are being executed on these tables, especially inserts/updated/deletes. This type of queries are causing locking and select have to wait until lock is being released.
Query can be slow because there is too much comparison cost of join and where clause between table_a and table_b.
You can reduce your cost by applying indexes to columns "table_b"."id", "table_a"."table_b_id", "table_a"."deleted_at", "table_b"."company_id", AND "table_a"."company_id".
You can create a view to reduce the cost as well. Views are returning cached information.
One last thing is you can reduce cost by using temporary table as well. I have given an example below.
QUERIES:
CREATE TEMPORARY TABLE table_a_temp as
SELECT "table_a"."table_b_id" FROM "table_a"
WHERE "table_a"."deleted_at" IS NULL AND "table_a"."company_id" = ? ;
CREATE TEMPORARY TABLE table_b_temp as
SELECT "table_b"."id" FROM "table_a"
WHERE"table_b"."deleted_at" IS NULL AND "table_b"."company_id" = ?;
SELECT COUNT(*) FROM "table_a_temp" INNER JOIN "table_b_temp"
ON "table_b_temp"."id" = "table_a_temp"."table_b_id" ;
I have a table quiet like this:
CREATE TABLE myTable (
family text,
names text[]
)
I can search like this:
SELECT family
FROM myTable where names #> array['B0WP04'];
But I would like to do:
SELECT family
FROM myTable where names #> array['%P0%'];
Is this possible ?
In postgreSQL 9.3 you can:
select family
from myTable
join lateral unnest(mytable.names) as un(name) on true
where un.name like '%P0%';
But keep in mind that it can produce duplicates so perhaphs you'd like to add distinct.
For earlier versions:
select family
from myTable where
exists (select 1 from unnest(names) as un(name) where un.name like '%P0%');
Adding a bit on Radek's answer, I tried
select family
from myTable where
exists (select 1 from unnest(names) as name where name like '%P0%');
and it also works. I searched in the PostgreSQL docs for the un() function, but can't find anything.
I'm not saying it doesn't do anything, but I'm just curious about what the un() function should do (and happy to have my problem solved)
You can use the parray_gin extension https://github.com/theirix/parray_gin
This extension is said to work only up to 9.2 but I just installed and tested it on 9.3 and it works well.
Here is how to install it on ubuntu-like systems :)
# install postgresql extension network client and postgresql extension build tools
sudo apt-get install python-setuptools
easy_install pgxnclient
sudo apt-get install postgresql-server-dev-9.3
# get the extension
pgxn install parray_gin
And here is my test
-- as a superuser: add the extension to the current database
CREATE EXTENSION parray_gin;
-- as a normal user
CREATE TABLE test (
id SERIAL PRIMARY KEY,
names TEXT []
);
INSERT INTO test (names) VALUES
(ARRAY ['nam1', 'nam2']),
(ARRAY ['2nam1', '2nam2']),
(ARRAY ['Hello', 'Woooorld']),
(ARRAY ['Woooorld', 'Hello']),
(ARRAY [] :: TEXT []),
(NULL),
(ARRAY ['Hello', 'is', 'it', 'me', 'you''re', 'looking', 'for', '?']);
-- double up the rows in test table, with many rows, the index is used
INSERT INTO test (names) (SELECT names FROM test);
SELECT count(*) from test; /*
count
--------
997376
(1 row)
*/
Now that we have some test data, it's magic time:
-- http://pgxn.org/dist/parray_gin/doc/parray_gin.html
CREATE INDEX names_idx ON test USING GIN (names parray_gin_ops);
--- now it's time for some tests
EXPLAIN ANALYZE SELECT * FROM test WHERE names #> ARRAY ['is']; /*
-- WITHOUT INDEX ON NAMES
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.00..25667.00 rows=1138 width=49) (actual time=0.021..508.599 rows=51200 loops=1)
Filter: (names #> '{is}'::text[])
Rows Removed by Filter: 946176
Total runtime: 653.879 ms
(4 rows)
-- WITH INDEX ON NAMES
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=455.73..3463.37 rows=997 width=49) (actual time=14.327..240.365 rows=51200 loops=1)
Recheck Cond: (names #> '{is}'::text[])
-> Bitmap Index Scan on names_idx (cost=0.00..455.48 rows=997 width=0) (actual time=12.241..12.241 rows=51200 loops=1)
Index Cond: (names #> '{is}'::text[])
Total runtime: 341.750 ms
(5 rows)
*/
EXPLAIN ANALYZE SELECT * FROM test WHERE names ##> ARRAY ['%nam%']; /*
-- WITHOUT INDEX ON NAMES
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.00..23914.20 rows=997 width=49) (actual time=0.023..590.093 rows=102400 loops=1)
Filter: (names ##> '{%nam%}'::text[])
Rows Removed by Filter: 894976
Total runtime: 796.636 ms
(4 rows)
-- WITH INDEX ON NAMES
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=159.73..3167.37 rows=997 width=49) (actual time=20.164..293.942 rows=102400 loops=1)
Recheck Cond: (names ##> '{%nam%}'::text[])
-> Bitmap Index Scan on names_idx (cost=0.00..159.48 rows=997 width=0) (actual time=18.539..18.539 rows=102400 loops=1)
Index Cond: (names ##> '{%nam%}'::text[])
Total runtime: 490.060 ms
(5 rows)
*/
The final performance totally depend on your data and queries but on my dummy example, this extension is very efficient cut query time in half.