Postgres NOT in array - arrays

I'm using Postgres' native array type, and trying to find the records where the ID is not in the array recipient IDs.
I can find where they are IN:
SELECT COUNT(*) FROM messages WHERE (3 = ANY (recipient_ids))
But this doesn't work:
SELECT COUNT(*) FROM messages WHERE (3 != ANY (recipient_ids))
SELECT COUNT(*) FROM messages WHERE (3 = NOT ANY (recipient_ids))
What's the right way to test for this condition?

SELECT COUNT(*) FROM "messages" WHERE NOT (3 = ANY (recipient_ids))
You can always negate WHERE (condition) with WHERE NOT (condition)

You could turn it around a bit and say "3 is not equal to all the IDs":
where 3 != all (recipient_ids)
From the fine manual:
9.21.4. ALL (array)
expression operator ALL (array expression)
The right-hand side is a parenthesized expression, which must yield an array value. The left-hand expression is evaluated and compared to each element of the array using the given operator, which must yield a Boolean result. The result of ALL is "true" if all comparisons yield true (including the case where the array has zero elements). The result is "false" if any false result is found.

Beware of NULLs
Both ALL:
(some_value != ALL(some_array))
And ANY:
NOT (some_value = ANY(some_array))
Would work as long as some_array is not null. If the array might be null, then you must account for it with coalesce(), e.g.
(some_value != ALL(coalesce(some_array, array[]::int[])))
Or
NOT (some_value = ANY(coalesce(some_array, array[]::int[])))
From the docs:
If the array expression yields a null array, the result of ANY will be null
If the array expression yields a null array, the result of ALL will be null

Augmenting the ALL/ANY Answers
I prefer all solutions that use all or any to achieve the result, appreciating the additional notes (e.g. about NULLs). As another augementation, here is a way to think about those operators.
You can think about them as short-circuit operators:
all(array) goes through all the values in the array, comparing each to the reference value using the provided operator. As soon as a comparison yields false, the process ends with false, otherwise true. (Comparable to short-circuit logical and.)
any(array) goes through all the values in the array, comparing each to the reference value using the provided operator. As soon as a comparison yields true, the process ends with true, otherwise false. (Comparable to short-circuit logical or.)
This is why 3 <> any('{1,2,3}') does not yield the desired result: The process compares 3 with 1 for inequality, which is true, and immediately returns true. A single value in the array different from 3 is enough to make the entire condition true. The 3 in the last array position is prob. never used.
3 <> all('{1,2,3}') on the other hand makes sure all values are not equal 3. It will run through all comparisons that yield true up to an element that yields false (the last in this case), to return false as the overall result. This is what the OP wants.

an update:
as of postgres 9.3,
you can use NOT in tandem with the #> (contains operator) to achieve this as well.
IE.
SELECT COUNT(*) FROM "messages" WHERE NOT recipient_ids #> ARRAY[3];

not (3 = any(recipient_ids))?

Note that the ANY/ALL operators will not work with array indexes. If indexes are in mind:
SELECT COUNT(*) FROM "messages" WHERE 3 && recipient_ids
and the negative:
SELECT COUNT(*) FROM "messages" WHERE NOT (3 && recipient_ids)
An index can then be created like:
CREATE INDEX recipient_ids_idx on tableName USING GIN(recipient_ids)

Use the following query
select id from Example where NOT (id = ANY ('{1, 2}'))

Related

Check if any of multiple values exist in an array in Postgres

Checking if one value exists in an array is pretty trivial. For example, the following will return true.
SELECT 'hello' = ANY(ARRAY['hello', 'bees'])
But what if I wanted to check if any of multiple values exist in an array? For example, I want to return true if 'hello' OR 'bye' exists in an array. I want to do something like
SELECT ['hello', 'bye'] = ANY(ARRAY['hello', 'bees'])
but that doesn't seem to work.
Edit:
I'm also looking to figure out how I can check if any of multiple values exist in an array where the multiple values have common prefixes.
For example, if I want to return true if the array contains any element with the prefix of 'hello'. So I basically want something like
SELECT ARRAY['hello%'] && ARRAY['helloOTHERSTUFF']
to be true.
For checking if any of the array elements exists in another array use overlap && operator like this:
SELECT ARRAY[1,4,3] && ARRAY[2,1] -- true
A check if every array element matches a specific pattern you could use unnest(anyarray) (to extract array elements) function combined with LIKE or POSIX Regular Expressions (to apply pattern matching) and an aggregate bool_and(expression) - to perform the bitwise AND operator and return one row of output.
Test case:
I have put array elements in separate lines to clarify which comparison yields true and which false.
SELECT bool_and(array_elements)
FROM (
SELECT unnest(
ARRAY[
'hello', -- compared with LIKE 'hello%' yields true
'helloSOMething', -- true
'helloXX', -- true
'hell', -- false
'nothing' -- false
]) ~~ 'hello%'
) foo(array_elements);
So if any of the comparison yields false then the bool_and(array_elements) will return false.
Note: If you need to compare your array against multiple patterns, you could go with POSIX comparison and use | which stands for alternative. As an example let's say we want to find out if every element of an array starts with either hello or not words:
SELECT bool_and(array_elements)
FROM (
SELECT unnest(
ARRAY[
'hello', -- true
'helloSOMething', -- true
'helloXX', -- true
'hell', -- false (doesn't start with neither "hello" nor "not")
'nothing' -- true (starts with not)
]) ~ '^hello|not' -- note the use of ~ instead of ~~ as earlier (POSIX vs LIKE)
) foo(array_elements);

Compare arrays for equality, ignoring order of elements

I have a table with 4 array columns.. the results are like:
ids signed_ids new_ids new_ids_signed
{1,2,3} | {2,1,3} | {4,5,6} | {6,5,4}
Anyway to compare ids and signed_ids so that they come out equal, by ignoring the order of the elements?
You can use contained by operator:
(array1 <# array2 and array1 #> array2)
The additional module intarray provides operators for arrays of integer, which are typically (much) faster. Install once per database with (in Postgres 9.1 or later):
CREATE EXTENSION intarray;
Then you can:
SELECT uniq(sort(ids)) = uniq(sort(signed_ids));
Or:
SELECT ids #> signed_ids AND ids <# signed_ids;
Bold emphasis on functions and operators from intarray.
In the second example, operator resolution arrives at the specialized intarray operators if left and right argument are type integer[].
Both expressions will ignore order and duplicity of elements. Further reading in the helpful manual here.
intarray operators only work for arrays of integer (int4), not bigint (int8) or smallint (int2) or any other data type.
Unlike the default generic operators, intarray operators do not accept NULL values in arrays. NULL in any involved array raises an exception. If you need to work with NULL values, you can default to the standard, generic operators by schema-qualifying the operator with the OPERATOR construct:
SELECT ARRAY[1,4,null,3]::int[] OPERATOR(pg_catalog.#>) ARRAY[3,1]::int[]
The generic operators can't use indexes with an intarray operator class and vice versa.
Related:
GIN index on smallint[] column not used or error "operator is not unique"
The simplest thing to do is sort them and compare them sorted. See sorting arrays in PostgreSQL.
Given sample data:
CREATE TABLE aa(ids integer[], signed_ids integer[]);
INSERT INTO aa(ids, signed_ids) VALUES (ARRAY[1,2,3], ARRAY[2,1,3]);
the best thing to do is to if the array entries are always integers is to use the intarray extension, as Erwin explains in his answer. It's a lot faster than any pure-SQL formulation.
Otherwise, for a general version that works for any data type, define an array_sort(anyarray):
CREATE OR REPLACE FUNCTION array_sort(anyarray) RETURNS anyarray AS $$
SELECT array_agg(x order by x) FROM unnest($1) x;
$$ LANGUAGE 'SQL';
and use it sort and compare the sorted arrays:
SELECT array_sort(ids) = array_sort(signed_ids) FROM aa;
There's an important caveat:
SELECT array_sort( ARRAY[1,2,2,4,4] ) = array_sort( ARRAY[1,2,4] );
will be false. This may or may not be what you want, depending on your intentions.
Alternately, define a function array_compare_as_set:
CREATE OR REPLACE FUNCTION array_compare_as_set(anyarray,anyarray) RETURNS boolean AS $$
SELECT CASE
WHEN array_dims($1) <> array_dims($2) THEN
'f'
WHEN array_length($1,1) <> array_length($2,1) THEN
'f'
ELSE
NOT EXISTS (
SELECT 1
FROM unnest($1) a
FULL JOIN unnest($2) b ON (a=b)
WHERE a IS NULL or b IS NULL
)
END
$$ LANGUAGE 'SQL' IMMUTABLE;
and then:
SELECT array_compare_as_set(ids, signed_ids) FROM aa;
This is subtly different from comparing two array_sorted values. array_compare_as_set will eliminate duplicates, making array_compare_as_set(ARRAY[1,2,3,3],ARRAY[1,2,3]) true, whereas array_sort(ARRAY[1,2,3,3]) = array_sort(ARRAY[1,2,3]) will be false.
Both of these approaches will have pretty bad performance. Consider ensuring that you always store your arrays sorted in the first place.
If your arrays have no duplicates and are of the same dimension:
use array contains #>
AND array_length where the length must match the size you want on both sides
select (string_agg(a,',' order by a) = string_agg(b,',' order by b)) from
(select unnest(array[1,2,3,2])::text as a,unnest(array[2,2,3,1])::text as b) A

Why does SUM(...) on an empty recordset return NULL instead of 0?

I understand why null + 1 or (1 + null) returns null: null means "unknown value", and if a value is unknown, its successor is unknown as well. The same is true for most other operations involving null.[*]
However, I don't understand why the following happens:
SELECT SUM(someNotNullableIntegerField) FROM someTable WHERE 1=0
This query returns null. Why? There are no unknown values involved here! The WHERE clause returns zero records, and the sum of an empty set of values is 0.[**] Note that the set is not unknown, it is known to be empty.
I know that I can work around this behaviour by using ISNULL or COALESCE, but I'm trying to understand why this behaviour, which appears counter-intuitive to me, was chosen.
Any insights as to why this makes sense?
[*] with some notable exceptions such as null OR true, where obviously true is the right result since the unknown value simply does not matter.
[**] just like the product of an empty set of values is 1. Mathematically speaking, if I were to extend $(Z, +)$ to $(Z union {null}, +)$, the obvious choice for the identity element would still be 0, not null, since x + 0 = x but x + null = null.
The ANSI-SQL-Standard defines the result of the SUM of an empty set as NULL. Why they did this, I cannot tell, but at least the behavior should be consistent across all database engines.
Reference: http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt on page 126:
b) If AVG, MAX, MIN, or SUM is specified, then
Case:
i) If TXA is empty, then the result is the null value.
TXA is the operative resultset from the selected column.
When you mean empty table you mean a table with only NULL values, That's why we will get NULL as output for aggregate functions. You can consider this as by design for SQL Server.
Example 1
CREATE TABLE testSUMNulls
(
ID TINYINT
)
GO
INSERT INTO testSUMNulls (ID) VALUES (NULL),(NULL),(NULL),(NULL)
SELECT SUM(ID) FROM testSUMNulls
Example 2
CREATE TABLE testSumEmptyTable
(
ID TINYINT
)
GO
SELECT SUM(ID) Sums FROM testSumEmptyTable
In both the examples you will NULL as output..

A PostgreSQL query with 'ANY' is not working

SELECT "Ticket_id" FROM "Tickets"
WHERE "Status" = 1 AND ("Ticket_id" != ANY(array[1,2,3])) Limit 6
And the result is 1,2,3,4,5,6
You want to use ALL, not ANY. From the fine manual:
9.21.3. ANY/SOME (array)
expression operator ANY (array expression)
[...] The left-hand expression is evaluated and compared to each element of the array using the given operator, which must yield a Boolean result. The result of ANY is "true" if any true result is obtained.
So if we say this:
1 != any(array[1,2])
then we'll get true since (1 != 1) or (1 != 2) is true. ANY is essentially an OR operator. For example:
=> select id from (values (1),(2),(3)) as t(id) where id != any(array[1,2]);
id
----
1
2
3
(3 rows)
If we look at ALL, we see:
9.21.4. ALL (array)
expression operator ALL (array expression)
[...] The left-hand expression is evaluated and compared to each element of the array using the given operator, which must yield a Boolean result. The result of ALL is "true" if all comparisons yield true...
so if we say this:
1 != all(array[1,2])
then we'll get false since (1 != 1) and (1 != 2) is false and we see that ALL is essentially an AND operator. For example:
=> select id from (values (1),(2),(3)) as t(id) where id != all(array[1,2]);
id
----
3
(1 row)
If you want to exclude all values in an array, use ALL:
select "Ticket_id"
from "Tickets"
where "Status" = 1
and "Ticket_id" != all(array[1,2,3])
limit 6
Do you mean:
"Ticked_id" NOT IN (1,2,3)

Is SQL Server's double checking needed here?

IF #insertedValue IS NOT NULL AND #insertedValue > 0
This logic is in a trigger.
The value comes from a deleted or inserted row (doesn't matter).
2 questions :
Do I need to check both conditions? (I want all value > 0, value in db can be nullable)
Does SQL Server check the expression in the order I wrote it ?
1) Actually, no, since if the #insertedValue is NULL, the expression #insertedValue > 0 will evaulate to false. (Actually, as Martin Smith points out in his comment, it will evaluate to a special value "unknown", which when forced to a Boolean result on its own collapses to false - examples: unknown AND true = unknown which is forced to false, unknown OR true = true.) But you're relying on comparison behaviour with NULL values. A single step equivalent method, BTW, would be:
IF ISNULL(#insertedValue, 0) > 0
IMHO, you're better sticking with the explicit NULL check for clarity if nothing else.
2) Since the query will be optimised before execution, there is absolutely no guarantee of order of execution or short circuiting of the AND operator.
Combining the two - if the double check is truly unnecessary, then it will probably be optimised out before execution anyway, but your SQL code will be more maintainable in my view if you make this explicit.
You can use COALESCE => Returns the first nonnull expression among its arguments.
Now you can make the query more flexible, by increasing the column limits and again you need to check the Greater Then Zero condition. Important point to note down here is you have the option to check values in multiple columns.
declare #val int
set #val = COALESCE( null, 1, 10 )
if(#val>0)
select 'fine'
else
select 'not fine'

Resources