Postgres jsonb contains #> should consider duplicates - database

I am trying to figure out how to do "contains all including duplicate elements" in a postgres jsonb array.
I am currently using #> which is returning true when the duplicate elements aren't in the original array.
For example, I am looking for an operator where this query returns true:
select '[1, 2, 3]'::jsonb #> '[1, 2]';
But this query returns false
select '[1, 2, 3]'::jsonb #> '[1, 1]';
#> returns true for both.

There's no json containment function that tests for containment of elements, respecting the elements' counts at the same time.
You can write such function yourself, e.g.:
create or replace function jsonb_full_contain(a jsonb, b jsonb)
returns boolean language sql as $$
select not exists(
select 1 from (
select 't1' t, jsonb_array_elements(a) v
union all
select 't2', jsonb_array_elements(b) v
) tt
group by v
having count(case when t='t1' then 1 end) < count(case when t='t2' then 1 end))
$$;
select jsonb_full_contain('[1,2,3]'::jsonb, '[1,1]'::jsonb); -- returns false
select jsonb_full_contain('[1,2,3]'::jsonb, '[1]'::jsonb); -- returns true
fiddle

Related

How to perform sql aggregation on Snowflake array and output multiple arrays?

I have a snowflake array as below rows which is an input, which I would want to check for each value in the array value and spit as multiple output arrays based on the value's length for values with 5 digits as one column, and values with 6 digits as another column.
ID_COL,ARRAY_COL_VALUE
1,[22,333,666666]
2,[1,55555,999999999]
3,[22,444]
Output table:
ID_COL,FIVE_DIGIT_COL,SIX_DIGIT_COL
1,[],[666666]
2,[555555],[]
3,[],[]
Please let me know if we could iterate through each array value and perform SQL aggregation to check column length and then output as a separate column outputs. Creating it through SQL would be great, but UDFs using javascript, python if an option would also be great.
Using SQL and FLATTEN:
CREATE OR REPLACE TABLE t(ID_COL INT,ARRAY_COL_VALUE VARIANT)
AS
SELECT 1,[22,333,666666] UNION ALL
SELECT 2,[1,55555,999999999] UNION ALL
SELECT 3,[22,444];
Query:
SELECT ID_COL,
ARRAY_AGG(CASE WHEN s.value BETWEEN 10000 AND 99999 THEN s.value END) AS FIVE_DIGIT_COL,
ARRAY_AGG(CASE WHEN s.value BETWEEN 100000 AND 999999 THEN s.value END) AS SIX_DIGIT_COL
FROM t, TABLE(FLATTEN(ARRAY_COL_VALUE)) AS s
GROUP BY ID_COL;
And Python UDF:
create or replace function filter_arr(arr variant, num_digits INT)
returns variant
language python
runtime_version = 3.8
handler = 'main'
as $$
def main(arr, num_digits):
return [x for x in arr if len(str(x))==num_digits]
$$;
SELECT ID_COL,
ARRAY_COL_VALUE,
filter_arr(ARRAY_COL_VALUE, 5),
filter_arr(ARRAY_COL_VALUE, 6)
FROM t;
Output:
If you're dealing strictly with numbers here's another way
with cte (id, array_col) as
(select 1,[22,333,666666,666666] union all
select 2,[1,22222,55555,999999999] union all
select 3,[22,444])
select *,
concat(',',array_to_string(array_col,',,'),',') as str_col,
regexp_substr_all(str_col,',([^,]{5}),',1,1,'e') as len_5,
regexp_substr_all(str_col,',([^,]{6}),',1,1,'e') as len_6
from cte;
The basic idea is to turn that array into a string and keep all the digits surrounded by , so that we can parse the pattern using regex_substr_all.
If you're dealing with strings, you can modify it to use a delimiter that won't show up in your data.

List sorting in Snowflake as part of the select

lets assume im having to following query
select column1, count(*) from (values ('a'), ('b'), ('ab'), ('ba')) group by 1;
COLUMN1 COUNT(*)
a 1
b 1
ba 1
ab 1
and I want that my grouping will be order insensitive means that I want to count ab and ba as the same value.
so the expected result will be
COLUMN1 COUNT(*)
a 1
b 1
ab 2
I thought about sorting the select so it will handle them both as the same value but I didnt find any option to sort the value in snowflake.
There may be a way to do this in a SQL, but a JavaScript UDF makes it easy:
create or replace function SORT_STRING(TEXT string)
returns string
language javascript
strict immutable
as
$$
return TEXT.split('').sort().join('');
$$;
select SORT_STRING(column1) SORTED_C1, count(*)
from (values ('a'), ('b'), ('ab'), ('ba')) group by 1;
SORTED_C1
COUNT(*)
a
1
b
1
ab
2

SQL IN Wildcard Char

I have a query which uses a IN Filter and works fine. I am wondering if there
is something like a wildcard char which will not filter anything
Select *
FROM [tbl_Leads]
where p_contact_first_name in ('Tom')
the above works as desired but what happens if i don't want to filter by anything and return all. I know i can create a second query and removing the IN clause but from the logic if possible it would be nicer if i can check for existence of filter value and if none present replace it with wildcard char
The IN operator doesn't allow wildcards or partial values to match. In fact it's just a syntactic sugar of a chaining of OR logical operators.
This query:
SELECT 1 FROM SomeTable AS T
WHERE T.Column IN (1, 2, 3, 4)
Is exactly the same as:
SELECT 1 FROM SomeTable AS T
WHERE
T.Column = 1 OR
T.Column = 2 OR
T.Column = 3 OR
T.COlumn = 4
And this is why having a NULL value with a NOT IN list will make all the logic result be UNKNOWN (hence interpreted as false and never return any record):
SELECT 1 FROM SomeTable AS T
WHERE T.Column NOT IN (1, 2, NULL, 4)
Will be:
SELECT 1 FROM SomeTable AS T
WHERE
NOT(
T.Column = 1 OR
T.Column = 2 OR
T.Column = NULL OR -- Always resolve to UNKNOWN (handled as false for the whole condition)
T.COlumn = 4
)
You have a few options to conditionally apply a filter like IN:
Use OR against another condition:
DECLARE #ApplyInFilter BIT = 0
SELECT 1 FROM SomeTable AS T
WHERE
(#ApplyInFilter = 1 AND T.Column IN (1, 2, 3, 4)) OR
#ApplyInFilter = 0
Avoid the query altogether (have to repeat whole statement):
DECLARE #ApplyInFilter BIT = 0
IF #ApplyInFilter = 1
BEGIN
SELECT 1 FROM SomeTable AS T
WHERE
T.Column IN (1, 2, 3, 4)
END
ELSE
BEGIN
SELECT 1 FROM SomeTable AS T
END
Use Dynamic SQL to conditionally omit the filter:
DECLARE #ApplyInFilter BIT = 0
DECLARE #DynamicSQL VARCHAR(MAX) = '
SELECT 1 FROM SomeTable AS T '
IF #ApplyInFilter = 1
SET #DynamicSQL += ' WHERE T.Column IN (1, 2, 3, 4) '
EXEC (#DynamicSQL)
Unfortunately, the best approach if you plan to have multiple conditional filters is the Dynamic SQL one. It will be the hardest to code but best for performance (with some caveats). Please read George's Menoutis link to fully understand pros and cons of each approach.
You can make use of not exists to get the desired results. From my understanding if you have a name like Tom you want only that row and if it does not you want all other rows to be displayed.
select 1 as ID, 'Tom' as Name into #temp
union all
select 2 as ID, 'Ben' as Name union all
select 3 as ID, 'Kim' as Name
union all
select 4 as ID, 'Jim' as Name
This query will check if Tom exists then display only that row if not display all.
select * from #temp
where name = 'TOm' or not exists (select 1 from #temp where name = 'Tom')
Result from above query:
ID Name
1 Tom
Lets test it, by deleting the row where Tom record is.
Delete from #temp
where name = 'Tom'
If you run the same query you get the following result.
select * from #temp
where name = 'TOm' or not exists (select 1 from #temp where name = 'Tom')
ID Name
2 Ben
3 Kim
4 Jim
As said by Dale Burrell, the fast way to implement dynamic search conditions (exactly what your problem is) is to put code like:
....and field=values or #searchThisField=0
The other solution would be dynamic sql.
I consider Erland Sommarskog's article to be the epitome of analyzing this specific subject.
Make two requests. The performance of these two queries will be better than that of a single universal query. You can compare the execution plan for these queries.

postgres array_agg ERROR: cannot accumulate arrays of different dimensionality

I have a parcels table in postgresql in which the zoning and zoning_description columns are array_agg cast over to text. the new.universities table has 9 rows and I need to return 9 rows in the output.
The purpose of this query is to find all the properties these universities are located on and collapse there zoning types into 1 unique column and union/dissolve their geometries into multipolygons
select array_agg(distinct dp.zoning) zoning,array_agg(distinct dp.zoning_description) zoning_description,
uni.school name_,uni.address,'University' type_,1000 buff,st_union(dp.geom)
from new.universities uni join new.detroit_parcels_update dp
on st_intersects(st_buffer(uni.geom,-10),dp.geom)
group by name_,uni.address,type_,buff
I get this error
ERROR: cannot accumulate arrays of different dimensionality
********** Error **********
ERROR: cannot accumulate arrays of different dimensionality
SQL state: 2202E
I can do array_agg(distinct dp.zoning::text) zoning etc.. but this returns a completely messed up column with nested arrays in arrays...
Based on the answer here is my updated query which does not work
select array_agg(distinct zoning_u) zoning,array_agg(distinct zoning_description_u) zoning_description,
uni.school name_,uni.address,'University' type_,1000::int buff,st_union(dp.geom) geom
from new.detroit_parcels_update dp,unnest(zoning) zoning_u,
unnest(zoning_description) zoning_description_u
join new.universities uni
on st_intersects(st_buffer(uni.geom,-10),dp.geom)
group by name_,uni.address,type_,buff order by name_
get this error
ERROR: invalid reference to FROM-clause entry for table "dp"
LINE 6: on st_intersects(st_buffer(uni.geom,-10),dp.geom)
^
HINT: There is an entry for table "dp", but it cannot be referenced from this part of the query.
********** Error **********
ERROR: invalid reference to FROM-clause entry for table "dp"
SQL state: 42P01
Hint: There is an entry for table "dp", but it cannot be referenced from this part of the query.
Character: 373
My Final query which worked was
with t as(select dp.zoning,dp.zoning_description,uni.school name_,uni.address,'University' type_,1000::int buff,st_union(dp.geom) geom
from new.detroit_parcels_update dp
join new.universities uni
on st_intersects(st_buffer(uni.geom,-10),dp.geom)
group by name_,uni.address,type_,buff,dp.zoning,zoning_description order by name_
)
select name_,address,type_,buff,st_union(geom) geom,array_agg(distinct z) zoning, array_agg(distinct zd) zoning_description
from t,unnest(zoning) z,unnest(zoning_description) zd
group by name_,address,type_,buff
Example data:
create table my_table(name text, numbers text[], letters text[]);
insert into my_table values
('first', '{1, 2}', '{a}' ),
('first', '{2, 3}', '{a, b}'),
('second', '{4}', '{c, d}'),
('second', '{5, 6}', '{c}' );
You should aggregate arrays elements, not arrays. Use unnest():
select
name,
array_agg(distinct number) as numbers,
array_agg(distinct letter) as letters
from
my_table,
unnest(numbers) as number,
unnest(letters) as letter
group by name;
name | numbers | letters
--------+---------+---------
first | {1,2,3} | {a,b}
second | {4,5,6} | {c,d}
(2 rows)
Alternatively, you can create a custom aggregate. You need a function to merge arrays (concatenation with duplicates removing):
create or replace function public.array_merge(arr1 anyarray, arr2 anyarray)
returns anyarray language sql immutable
as $$
select array_agg(distinct elem order by elem)
from (
select unnest(arr1) elem
union
select unnest(arr2)
) s
$$;
create aggregate array_merge_agg(anyarray) (
sfunc = array_merge,
stype = anyarray
);
select
name,
array_merge_agg(numbers) as numbers,
array_merge_agg(letters) as letters
from my_table
group by name;
A much simpler alternative is to create a custom aggregate function (you only need to do this once)
CREATE AGGREGATE array_concat_agg(anyarray) (
SFUNC = array_cat,
STYPE = anyarray
);
Then replace array_agg for array_concat_agg:
SELECT
array_concat_agg(DISTINCT dp.zoning) zoning,
array_concat_agg(DISTINCT dp.zoning_description) zoning_description,
uni.school name_,
uni.address,
'University' type_,
1000 buff,
st_union(dp.geom)
FROM
new.universities uni
JOIN new.detroit_parcels_update dp ON st_intersects(st_buffer(uni.geom, - 10), dp.geom)
GROUP BY
name_,
uni.address,
type_,
buff

Check if a group contains all the ids in any of the arrays supplied

I pass a 2d array to a procedure. This array contains multiple arrays of ids. I want to
group a table by group_id
for each group, for each array in the 2d array
IF this group has all the ids within this iteration array, then return it
I read here about issues with 2d arrays:
postgres, contains-operator for multidimensional arrays performs flatten before comparing?
I think I'm nearly there, but I am unsure how to get around the problem. I understand why the following code produces the error "Subquery can only return one column", but I cant work out how to fix it
DEALLOCATE my_proc;
PREPARE my_proc (bigint[][]) AS
WITH cte_arr AS (select $1 AS arr),
cte_s AS (select generate_subscripts(arr,1) AS subscript,
arr from cte_arr),
grouped AS (SELECT ufs.user_id, array_agg(entity_id)
FROM table_A AS ufs
GROUP BY ufs.user_id)
SELECT *
FROM grouped
WHERE (select arr[subscript:subscript] #> array_agg AS sub,
arr[subscript:subscript]
from cte_s);
EXECUTE my_proc(array[array[1, 2], array[1,3]]);
You can create a row for each group and each array in the parameter with a cross join:
PREPARE stmt (bigint[][]) AS
with grouped as
(
select user_id
, array_agg(entity_id) as user_groups
from table_A
group by
user_id
)
select user_id
, user_groups
, $1[subscript:subscript] as matches
from grouped
cross join
generate_subscripts($1, 1) as gen(subscript)
where user_groups #> $1[subscript:subscript]
;
Example at SQL Fiddle

Resources