optimize sum array array of objects by key in postgresql - arrays

I need a postgresql function that merge and sum (by key) 4 jsonb array of objects. Each jsonb can have 0 or multiple objects:
parameter 1: [ {"a": 1.0}, {"b": 2.5} ]
parameter 2: [ {"a": 1.0} ]
parameter 3: [ {"a": 1.0}, {"c": 2.5} ]
parameter 4: [ {"a": 1.0}, {"b": 2.5} ]
and the result expected is this:
[{"a": 4.0}, {"b": 5}, {"c": 2.5}]
I have a function that actually does that. But it's performance is very poor. I need to call it for each row. At this moment, with 1.4 millions of rows, the difference adding the function call is from 39 sec, to 2 min 30 sec. we expect to raise about more than 50 million of results, and that will be about 1 hour 40 min aprox.
I'm really new to postgresql and this is the best function I can do. I don't know if there is a more effective way to do this.
this is my actual function.
create or replace function join_and_sum(parameter1 jsonb, parameter2 jsonb, parameter3 jsonb, parameter4 jsonb) returns jsonb
language plpgsql
as
$$
DECLARE
column_jsonb jsonb;
BEGIN
select into column_jsonb jsonb_agg(p.jsonb_build_object)
from (
SELECT jsonb_build_object(key, SUM(value::float))
FROM (
SELECT (JSONB_EACH_TEXT(j)).*
from jsonb_array_elements( parameter1 || parameter2 || parameter3 || parameter4) j
) j
group by j.key
) p;
RETURN column_jsonb;
END;
$$;
thanks in advance

Related

Concatenate Values of all Elements of An Array of Maps in Spark SQL

I am new to Spark Sql and I have a column of type array with data like below :
[{"X":"A11"},{"X":"A12"},{"X":"A13"}]
The output I am looking for is a string field as
A11, A12, A13
I cannot explode the array as I need the data in one row.
Since the maximum length of the array in my case is 6, I got it to work using below case statement.
case
when size(arr)=1 then array_join(map_values(map_concat(arr[0])),',')
when size(arr)=2 then array_join(map_values(map_concat(arr[0],arr[1])),',')
when size(arr)=3 then array_join(map_values(map_concat(arr[0],arr[1],arr[2])),',')
when size(arr)=4 then array_join(map_values(map_concat(arr[0],arr[1],arr[2],arr[3])),',')
when size(arr)=5 then array_join(map_values(map_concat(arr[0],arr[1],arr[2],arr[3],arr[4])),',')
when size(arr)=6 then array_join(map_values(map_concat(arr[0],arr[1],arr[2],arr[3],arr[4],arr[5])),',')
else
null
end
Is there a better way to do this?
Assuming that the source and result columns are col and values respectively, it can be implemented as follows:
data = [
([{"X": "A11"}, {"X": "A12"}, {"X": "A13"}],)
]
df = spark.createDataFrame(data, ['col'])
df = df.withColumn('values', F.array_join(F.flatten(F.transform('col', lambda x: F.map_values(x))), ','))
df.show(truncate=False)

count jsonb array with condition in postgres

I have a postgres database where some column data are stored as follow:
guest_composition
charging_age
[{"a": 1, "b": 1, "c": 1, "children_ages": [10, 5, 2, 0.1]}]
3
[{"a": 1, "b": 1, "c": 1, "children_ages": [2.5, 1, 4]}]
3
i want to go over the children_ages array and to return the count of children that are above the age of 3. I am having a hard time to use the array data because it is returns as jsonb and not int array.
the first row should return 2 because there are 2 children above the age of 3. The second row should return 1 because there is 1 child above the age of 3.
I have tried the following but it didn't work:
WITH reservation AS (SELECT jsonb_array_elements(reservations.guest_composition)->'children_ages' as children_ages, charging_age FROM reservations
SELECT (CASE WHEN (reservations.charging_age IS NOT NULL AND reservation.children_ages IS NOT NULL) THEN SUM( CASE WHEN (reservation.children_ages)::int[] >=(reservations.charging_age)::int THEN 1 ELSE 0 END) ELSE 0 END) as children_to_charge
You can extract an array of all child ages using a SQL JSON path function:
select jsonb_path_query_array(r.guest_composition, '$[*].children_ages[*] ? (# > 3)')
from reservations r;
The length of that array is then the count you are looking for:
select jsonb_array_length(jsonb_path_query_array(r.guest_composition, '$[*].children_ages[*] ? (# > 3)'))
from reservations r;
It's unclear to me if charging_age is a column and could change in every row. If that is the case, you can pass a parameter to the JSON path function:
select jsonb_path_query_array(
r.guest_composition, '$[*].children_ages[*] ? (# > $age)',
jsonb_build_object('age', charging_age)
)
from reservations r;

modify only last value in jsonb field

I have table with jsonb field
Example:
id jsonb_t
1 [ {"x" : 1 , "y": 2} , {"x" : 2 , "y": 3} , {"x": 3, "y" : 4} ]
2 [ {"x" : 1 , "y": 3} , {"x" : 3 , "y": 3} , {"x": 8, "y" : 2} ]
3 [ {"x" : 1 , "y": 4} , {"x" : 4 , "y": 3} , {"x": 5, "y" : 9} ]
I want to modify table where id = 3 but only the last row in jsonb array of objects , it means replace e.g. "y":9 into "y":8 , and increment "x":5 by 1 to "x":6 .
I can't figure out how to do it in one step (replace and increment should be done "in place" due to thousends rows in jsonb[] array field) , thanks in advance for help .
You can use some jsonb functions such alike
SELECT jsonb_agg(jsonb_build_object('x', x, 'y', y))
FROM (SELECT CASE
WHEN row_number() over() = jsonb_array_length(jsonb_t) THEN
x + 1
ELSE
x
END AS x,
CASE
WHEN row_number() over() = jsonb_array_length(jsonb_t) THEN
y - 1
ELSE
y
END AS y
FROM t, jsonb_to_recordset(jsonb_t) AS(x INT, y INT)
WHERE id = 3) AS j
Demo
where jsonb_to_recordset expands outermost array of objects as individual integer elements, then (in/de)crement them after determining match through use of row_number and jsonb_array_length functions, then go back to build up the jsonb value again within the main query.
Your sample data looks like the column is in fact defined as jsonb not jsonb[] and the array is a proper JSON array (not an array of jsonb values)
If that is correct, then you can use jsonb_set() to extract and modify the value of the last array element:
update the_table
set jsonb_t = jsonb_set(jsonb_t,
array[jsonb_array_length(jsonb_t)-1]::text[],
jsonb_t -> jsonb_array_length(jsonb_t)-1 ||
'{"y":8}' ||
jsonb_build_object('x', (jsonb_t -> jsonb_array_length(jsonb_t)-1 ->> 'x')::int + 1)
)
where id = 3
As documented in the manual jsonb_set() takes three parameters: the input value, the path to the value that should be changed and the new value.
The second parameter array[jsonb_array_length(jsonb_t)-1]::text[] calculates the target position in the JSON array by taking its length and subtracting one to get the last element. This integer is then converted to a text array (which is the required type for the second parameter).
The expression jsonb_t -> jsonb_array_length(jsonb_t)-1 then picks that array element and appends the '{"y":8}' which will replace the existing key/value pair with y. The expression
jsonb_build_object('x', (jsonb_t -> jsonb_array_length(jsonb_t)-1 ->> 'x')::int + 1
extracts the current value of the x key, converting it to an integer, increments it by one and builds a new JSON object with the key x that is also appended to the old value, thus replacing the existing key.
Online example

How to check if an array is in a multidimensional array

I'd want to check, using a filtering query, if an array is equal to an element of another multidimensional array which can be considered as an array of arrays.
For example:
Given the multidimensional array {{1,2}, {3,4}, {5,6}} I want to check if a one dimension array of one of the array elements.
Expected results:
Input: {1,2} or {3,4} -> Output: TRUE
Input: {2,3} or {1,5} -> Output: FALSE
I've already tried <#, but it returns TRUE for all the examples cases and I can't use ANY without slicing the multidimensional array.
Does anyone have a solution without using pgplsql?
This does seem like a difficult problem to solve without any pgpsql. However, if this function is utilized, it is much simpler:
https://wiki.postgresql.org/wiki/Unnest_multidimensional_array
CREATE OR REPLACE FUNCTION public.reduce_dim(anyarray)
RETURNS SETOF anyarray AS
$function$
DECLARE
s $1%TYPE;
BEGIN
FOREACH s SLICE 1 IN ARRAY $1 LOOP
RETURN NEXT s;
END LOOP;
RETURN;
END;
$function$
LANGUAGE plpgsql IMMUTABLE;
To use:
create table array_test (arr integer[][]);
insert into array_test (select '{{1,2}, {3,4}, {5,6}}');
select (case when '{1,2}' in (select reduce_dim(arr) from array_test) then true
else false end);
case
------
t
(1 row)
select (case when '{1,4}' in (select reduce_dim(arr) from array_test) then true
else false end);
case
------
f
(1 row)
Simple way: search in array like in string:
select '{{1, 2}, {3, 4}, {5, 6}}'::int[][]::text like '%{1,2}%';
Complex way: decompose array to slices (without plpgsql):
with t(x) as (values('{{1, 2}, {3, 4}, {5, 6}}'::int[][]))
select *
from t
where (
select bool_or(x[s:s] = '{{1,3}}') from generate_subscripts(x,1) as s);

How to get and compare the elements of the jsonb array in Postgres?

Postgres 9.6.1
CREATE TABLE "public"."test" (
"id" int4 NOT NULL,
"packet" jsonb,
)
WITH (OIDS=FALSE)
;
Jsonb
{"1": {"end": 14876555, "quantity":10}, "2": {"end": 14876555, "quantity":10} }
or
[{"op": 1, "end": 14876555, "quantity": 10}, {"op": 2, "end": 14876555, "quantity": 20}]
All attempts to retrieve an array results in an error:
cannot extract elements from an object
It is necessary to compare all the elements "end" < 1490000 and find the id.
"op": 1 or "1": variable value and the full path is not suitable for solutions
If you have no the agreed JSON structure the best solution IMO is something like
select *
from
public.test,
regexp_matches(packet::text, '"end":\s*(\d+)', 'g') as e(x)
where
x[1]::numeric < 1490000;

Resources