Postgres UNIQUE CONSTRAINT for array - arrays

How to create a constraint on the uniqueness of all the values ​​in the array like:
CREATE TABLE mytable
(
interface integer[2],
CONSTRAINT link_check UNIQUE (sort(interface))
)
my sort function
create or replace function sort(anyarray)
returns anyarray as $$
select array(select $1[i] from generate_series(array_lower($1,1),
array_upper($1,1)) g(i) order by 1)
$$ language sql strict immutable;
I need that would be the value {10, 22} and {22, 10} considered the same and check under the UNIQUE CONSTRAINT

I don't think you can use a function with a unique constraint but you can with a unique index. So given a sorting function something like this:
create function sort_array(anyarray) returns anyarray as $$
select array_agg(distinct n order by n) from unnest($1) as t(n);
$$ language sql immutable;
Then you could do this:
create table mytable (
interface integer[2]
);
create unique index mytable_uniq on mytable (sort_array(interface));
Then the following happens:
=> insert into mytable (interface) values (array[11,23]);
INSERT 0 1
=> insert into mytable (interface) values (array[11,23]);
ERROR: duplicate key value violates unique constraint "mytable_uniq"
DETAIL: Key (sort_array(interface))=({11,23}) already exists.
=> insert into mytable (interface) values (array[23,11]);
ERROR: duplicate key value violates unique constraint "mytable_uniq"
DETAIL: Key (sort_array(interface))=({11,23}) already exists.
=> insert into mytable (interface) values (array[42,11]);
INSERT 0 1

#mu already demonstrated how an index on an expression can solve your problem.
My attention was caught by the used functions. Both seem like an overkill for array of two integers. This may be a simplification of the real situation. (?)
Anyway, I was intrigued and ran a test with a couple of variants.
Test setup
-- temporary table with 10000 random pairs of integer
CREATE TEMP TABLE arr (i int[]);
INSERT INTO arr
SELECT ARRAY[(random() * 1000)::int, (random() * 1000)::int]
FROM generate_series(1,10000);
Test candidates with a short comment to explain each one:
-- 1) mu's query
CREATE OR REPLACE FUNCTION sort_array1(integer[]) RETURNS int[] AS
$$
SELECT array_agg(n) FROM (SELECT n FROM unnest($1) AS t(n) ORDER BY n) AS a;
$$ LANGUAGE sql STRICT IMMUTABLE;
-- 2) simplified with ORDER BY inside aggregate (pg 9.0+)
CREATE OR REPLACE FUNCTION sort_array2(int[]) RETURNS int[] AS
$$
SELECT array_agg(n ORDER BY n) FROM unnest($1) AS t(n);
$$ LANGUAGE sql STRICT IMMUTABLE;
-- 3) uralbash's query
CREATE OR REPLACE FUNCTION sort_array3(anyarray) RETURNS anyarray AS
$$
SELECT ARRAY(
SELECT $1[i]
FROM generate_series(array_lower($1,1), array_upper($1,1)) g(i)
ORDER BY 1)
$$ LANGUAGE sql STRICT IMMUTABLE;
-- 4) change parameters to int[]
CREATE OR REPLACE FUNCTION sort_array4(int[]) RETURNS int[] AS
$$
SELECT ARRAY(
SELECT $1[i]
FROM generate_series(array_lower($1,1), array_upper($1,1)) g(i)
ORDER BY 1)
$$ LANGUAGE sql STRICT IMMUTABLE;
-- 5) simplify array_lower() - it's always 1
CREATE OR REPLACE FUNCTION sort_array5(int[]) RETURNS int[] AS
$$
SELECT ARRAY(
SELECT $1[i]
FROM generate_series(1, array_upper($1,1)) g(i)
ORDER BY 1)
$$ LANGUAGE sql STRICT IMMUTABLE;
-- 6) further simplify to case with 2 elements
CREATE OR REPLACE FUNCTION sort_array6(int[]) RETURNS int[] AS
$$
SELECT ARRAY(
SELECT i
FROM (VALUES ($1[1]),($1[2])) g(i)
ORDER BY 1)
$$ LANGUAGE sql STRICT IMMUTABLE;
-- 7) my radically simple query
CREATE OR REPLACE FUNCTION sort_array7(int[]) RETURNS int[] AS
$$
SELECT CASE WHEN $1[1] > $1[2] THEN ARRAY[$1[2], $1[1]] ELSE $1 END;
$$ LANGUAGE sql STRICT IMMUTABLE;
-- 8) without STRICT modifier
CREATE OR REPLACE FUNCTION sort_array8(int[]) RETURNS int[] AS
$$
SELECT CASE WHEN $1[1] > $1[2] THEN ARRAY[$1[2], $1[1]] ELSE $1 END;
$$ LANGUAGE sql IMMUTABLE;
Results
I executed each around 20 times and took the best result from EXPLAIN ANALYZE.
SELECT sort_array1(i) FROM arr -- Total runtime: 183 ms
SELECT sort_array2(i) FROM arr -- Total runtime: 175 ms
SELECT sort_array3(i) FROM arr -- Total runtime: 183 ms
SELECT sort_array4(i) FROM arr -- Total runtime: 183 ms
SELECT sort_array5(i) FROM arr -- Total runtime: 177 ms
SELECT sort_array6(i) FROM arr -- Total runtime: 144 ms
SELECT sort_array7(i) FROM arr -- Total runtime: 103 ms
SELECT sort_array8(i) FROM arr -- Total runtime: 43 ms (!!!)
These are the results from a v9.0.5 server on Debian Squeeze. Similar results on v.8.4.
I also tested plpgsql variants which were a bit slower as expected: too much overhead for a tiny operation, no query plan to cache.
The simple function (nr. 7) is substantially faster than the others. That was to be expected, the overhead of the other variants is just too much for a tiny array.
But that leaving away the STRICT modifier more than doubles the speed was not to be expected. At least I didn't. I posted a question about this phenomenon here.

Just create a unique index on the two values:
create unique index ix on
mytable(least(interface[1], interface[2]), greatest(interface[1], interface[2]));

Related

Why recursive union does not work with composite types in PostgreSQL

I have a table with fields of composite type. When I've tried to perform recursive union with such fields I got an error.
drop type example_t cascade;
create type example_t as (
value text,
key text
);
drop table if exists example cascade;
create table example (
inbound example_t,
outbound example_t,
primary key (inbound, outbound)
);
create or replace function example_fn(_attrs example_t[])
returns table (attr example_t) as $$
with recursive target as (
select outbound
from example
where array[inbound] <# _attrs
union
select r.outbound
from target as t
inner join example as r on r.inbound = t.outbound
)
select unnest(_attrs)
union
select * from target;
$$ language sql immutable;
select example_fn(array[('foo', 'bar') ::example_t]);
ERROR: could not implement recursive UNION DETAIL: All column datatypes must be hashable. CONTEXT: SQL function "example_fn" during startup SQL state: 0A000
Non-recursive union just works
create or replace function example_fn(_attrs example_t[])
returns table (attr example_t) as $$
select unnest(_attrs)
union
select * from example;
$$ language sql immutable;
select example_fn(array[('foo', 'bar') ::example_t]);
I can refactor my function this way to make it works. But it looks weird. I mean it is less readable. Is there any way to do it better?
create or replace function example_fn(_attrs example_t[])
returns table (attr example_t) as $$
with recursive target as (
select (outbound).value, (outbound).key
from example
where array[inbound] <# _attrs
union
select (r.outbound).value, (r.outbound).key
from target as t
inner join example as r on r.inbound = (t.value, t.key) ::example_t
)
select (unnest(_attrs)).*
union
select * from target;
$$ language sql immutable;
There is a thread on PostgreSQL hackers mailing list and the short explanation by Tom Lane:
In general we consider that a datatype's notion of equality can be defined either by its default btree opclass (which supports sort-based query algorithms) or by its default hash opclass (which supports hash-based query algorithms).
The plain UNION code supports either sorting or hashing, but we've not gotten around to supporting a sort-based approach to recursive UNION. I'm not convinced that it's worth doing ...
As a workaround use union all:
with recursive target as (
select outbound
from example
where inbound = ('a', 'a')::example_t
union all
select r.outbound
from target as t
inner join example as r on r.inbound = t.outbound
)
select *
-- or, if necessary
-- select distinct *
from target

Postgresql stored procedure return select result set

In Microsoft SQL server I could do something like this :
create procedure my_procedure #argument1 int, #argument2 int
as
select *
from my_table
where ID > #argument1 and ID < #argument2
And that would return me table with all columns from my_table.
Closest thing to that what I managed to do in postgresql is :
create or replace function
get_test()
returns setof record
as
$$ select * from my_table $$
language sql
or i could define my table type, but manually recreating what technically already exists is very impractical.
create or replace function
get_agent_summary()
returns table (
column1 type, column2 type, ...
)
as
$$
begin
return query select col1, col2, ... from my_existing_table;
...
and it is pain to maintain.
So, how can I easily return resultset without redefining defining every single column from table that I want to return?
In Postgres a table automatically defines the corresponding type:
create or replace function select_my_table(argument1 int, argument2 int)
returns setof my_table language sql as $$
select *
from my_table
where id > argument1 and id < argument2;
$$;
select * from select_my_table(0, 2);
The syntax is more verbose than in MS SQL Server because you can create functions in one of several languages and functions may be overloaded.

How to get distinct array elements with postgres?

I have an array with duplicate values in postgres. For example:
SELECT cardinality(string_to_array('1,2,3,4,4', ',')::int[]) as foo
=> "foo"=>"5"
I would like to get unique elements, for example:
SELECT cardinality(uniq(string_to_array('1,2,3,4,4', ',')::int[])) as foo
=> -- No function matches the given name and argument types. You might need to add explicit type casts.
Can I get unique elements of an array in postgres without using UNNEST ?
I prefer this syntax (about 5% faster)
create or replace function public.array_unique(arr anyarray)
returns anyarray as $body$
select array( select distinct unnest($1) )
$body$ language 'sql';
using:
select array_unique(ARRAY['1','2','3','4','4']);
For integer arrays use intarray extension:
create extension if not exists intarray;
select cardinality(uniq(string_to_array('1,2,3,4,4', ',')::int[])) as foo
or the function
create or replace function public.array_unique(arr anyarray)
returns anyarray
language sql
as $function$
select array_agg(distinct elem)
from unnest(arr) as arr(elem)
$function$;
for any array. You can easily modify the function to preserve the original order of the array elements:
create or replace function public.array_unique_ordered(arr anyarray)
returns anyarray
language sql
as $function$
select array_agg(elem order by ord)
from (
select distinct on(elem) elem, ord
from unnest(arr) with ordinality as arr(elem, ord)
order by elem, ord
) s
$function$;
Example:
with my_data(arr) as (values ('{d,d,a,c,b,b,a,c}'::text[]))
select array_unique(arr), array_unique_ordered(arr)
from my_data
array_unique | array_unique_ordered
--------------+----------------------
{a,b,c,d} | {d,a,c,b}
(1 row)
Going off of #klin's accepted answer, I modified it to remove nulls in the process of choosing only the distinct values.
create or replace function public.array_unique_no_nulls(arr anyarray)
returns anyarray
language sql
as $function$
select array_agg(distinct a)
from (
select unnest(arr) a
) alias
where a is not null
$function$;

Stored procedure syntax with IN condition

(1)
=>CREATE TABLE T1(id BIGSERIAL PRIMARY KEY, name TEXT);
CREATE TABLE
(2)
=>INSERT INTO T1
(name) VALUES
('Robert'),
('Simone');
INSERT 0 2
(3)
SELECT * FROM T1;
id | name
----+--------
1 | Robert
2 | Simone
(2 rows)
(4)
CREATE OR REPLACE FUNCTION test_me(id_list BIGINT[])
RETURNS BOOLEAN AS
$$
BEGIN
PERFORM * FROM T1 WHERE id IN ($1);
IF FOUND THEN
RETURN TRUE;
ELSE
RETURN FALSE;
END IF;
END;
$$
LANGUAGE 'plpgsql';
CREATE FUNCTION
My problem is when calling the procedure. I'm not able to find an example on the net showing how to pass a list of values of type BIGINT (or integer, whatsoever).
I tried what follows without success (syntax errors):
First syntax:
eway=> SELECT * FROM test_me('{1,2}'::BIGINT[]);
ERROR: operator does not exist: bigint = bigint[]
LINE 1: SELECT * FROM T1 WHERE id IN ($1)
^
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.
QUERY: SELECT * FROM T1 WHERE id IN ($1)
CONTEXT: PL/pgSQL function test_me(bigint[]) line 3 at PERFORM
Second syntax:
eway=> SELECT * FROM test_me('{1,2}');
ERROR: operator does not exist: bigint = bigint[]
LINE 1: SELECT * FROM T1 WHERE id IN ($1)
^
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.
QUERY: SELECT * FROM T1 WHERE id IN ($1)
CONTEXT: PL/pgSQL function test_me(bigint[]) line 3 at PERFORM
Third syntax:
eway=> SELECT * FROM test_me(ARRAY [1,2]);
ERROR: operator does not exist: bigint = bigint[]
LINE 1: SELECT * FROM T1 WHERE id IN ($1)
^
HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.
QUERY: SELECT * FROM T1 WHERE id IN ($1)
CONTEXT: PL/pgSQL function test_me(bigint[]) line 3 at PERFORM
Any clues about a working syntax?
It's like the parser was trying to translate a BIGINT to BIGINT[] in the PEFORM REQUEST but it doesn't make any sense to me...
All your syntax variants to pass an array are correct.
Pass array literal to PostgreSQL function
The problem is with the expression inside the function. You can test with the ANY construct like #Mureinik provided or a number of other syntax variants. In any case run the test with an EXISTS expression:
CREATE OR REPLACE FUNCTION test_me(id_list bigint[])
RETURNS bool AS
$func$
BEGIN
IF EXISTS (SELECT 1 FROM t1 WHERE id = ANY ($1)) THEN
RETURN true;
ELSE
RETURN false;
END IF;
END
$func$ LANGUAGE plpgsql STABLE;
Notes
EXISTS is shortest and most efficient:
PL/pgSQL checking if a row exists - SELECT INTO boolean
The ANY construct applied to arrays is only efficient with small arrays. For longer arrays, other syntax variants are faster. Like:
IF EXISTS (SELECT 1 FROM unnest($1) id JOIN t1 USING (id)) THEN ...
How to do WHERE x IN (val1, val2,…) in plpgsql
Don't quote the language name, it's an identifier, not a string: LANGUAGE plpgsql
Simple variant
While you are returning a boolean value, it can be even simpler. It's probably just for the demo, but as a proof of concept:
CREATE OR REPLACE FUNCTION test_me(id_list bigint[])
RETURNS bool AS
$func$
SELECT EXISTS (SELECT 1 FROM t1 WHERE id = ANY ($1))
$func$ LANGUAGE sql STABLE;
Same result.
The easiest way to check if an item is in an array is with = ANY:
CREATE OR REPLACE FUNCTION test_me(id_list BIGINT[])
RETURNS BOOLEAN AS
$$
BEGIN
PERFORM * FROM T1 WHERE id = ANY ($1);
IF FOUND THEN
RETURN TRUE;
ELSE
RETURN FALSE;
END IF;
END;
$$
LANGUAGE 'plpgsql';

Postgres function with text array and select where in query

I need to create a function like this (scaled down to a minimum) where I send an array of strings that should be matched. But I cant make the query to work.
create or replace function bar(x text[]) returns table (c bigint) language plpgsql as $$
begin
return query select count(1) as counter from my_table where my_field in (x);
end;$$;
and call it like this
select * from bar(ARRAY ['a','b']);
I could try to let the parameter x be a single text string and then use something like
return query execute 'select ... where myfield in ('||x||')';
So how would I make it work with the parameter as an array?
would that be better or worse compared to let the parameter be a string?
Yes, an array is the cleaner form. String matching would leave corner cases where separators and patterns combined match ...
To find strings that match any of the given patterns, use the ANY construct:
CREATE OR REPLACE FUNCTION bar(x text[])
RETURNS bigint LANGUAGE sql AS
$func$
SELECT count(*) -- alias wouldn't visible outside function
FROM my_table
WHERE my_field = ANY(x);
$func$;
count(*) is slightly faster than count(1). Same result.
Note, I am using a plain SQL function (instead of plpgsql). Either has its pros and cons.
That's fixed with the help of unnest that converts an array to a set (btw, the function doesn't have to be plpgsql):
CREATE OR REPLACE FUNCTION bar(x text[]) RETURNS BIGINT LANGUAGE sql AS $$
SELECT count(1) AS counter FROM my_table
WHERE my_field IN (SELECT * FROM unnest(x));
$$;
The problem with using the array seems to be fixed by using
return query select count(1) as counter from my_table where my_field in (array_to_string(x,','));
The point of effiency still remains unsolved.

Resources