modify only last value in jsonb field - arrays

I have table with jsonb field
Example:
id jsonb_t
1 [ {"x" : 1 , "y": 2} , {"x" : 2 , "y": 3} , {"x": 3, "y" : 4} ]
2 [ {"x" : 1 , "y": 3} , {"x" : 3 , "y": 3} , {"x": 8, "y" : 2} ]
3 [ {"x" : 1 , "y": 4} , {"x" : 4 , "y": 3} , {"x": 5, "y" : 9} ]
I want to modify table where id = 3 but only the last row in jsonb array of objects , it means replace e.g. "y":9 into "y":8 , and increment "x":5 by 1 to "x":6 .
I can't figure out how to do it in one step (replace and increment should be done "in place" due to thousends rows in jsonb[] array field) , thanks in advance for help .

You can use some jsonb functions such alike
SELECT jsonb_agg(jsonb_build_object('x', x, 'y', y))
FROM (SELECT CASE
WHEN row_number() over() = jsonb_array_length(jsonb_t) THEN
x + 1
ELSE
x
END AS x,
CASE
WHEN row_number() over() = jsonb_array_length(jsonb_t) THEN
y - 1
ELSE
y
END AS y
FROM t, jsonb_to_recordset(jsonb_t) AS(x INT, y INT)
WHERE id = 3) AS j
Demo
where jsonb_to_recordset expands outermost array of objects as individual integer elements, then (in/de)crement them after determining match through use of row_number and jsonb_array_length functions, then go back to build up the jsonb value again within the main query.

Your sample data looks like the column is in fact defined as jsonb not jsonb[] and the array is a proper JSON array (not an array of jsonb values)
If that is correct, then you can use jsonb_set() to extract and modify the value of the last array element:
update the_table
set jsonb_t = jsonb_set(jsonb_t,
array[jsonb_array_length(jsonb_t)-1]::text[],
jsonb_t -> jsonb_array_length(jsonb_t)-1 ||
'{"y":8}' ||
jsonb_build_object('x', (jsonb_t -> jsonb_array_length(jsonb_t)-1 ->> 'x')::int + 1)
)
where id = 3
As documented in the manual jsonb_set() takes three parameters: the input value, the path to the value that should be changed and the new value.
The second parameter array[jsonb_array_length(jsonb_t)-1]::text[] calculates the target position in the JSON array by taking its length and subtracting one to get the last element. This integer is then converted to a text array (which is the required type for the second parameter).
The expression jsonb_t -> jsonb_array_length(jsonb_t)-1 then picks that array element and appends the '{"y":8}' which will replace the existing key/value pair with y. The expression
jsonb_build_object('x', (jsonb_t -> jsonb_array_length(jsonb_t)-1 ->> 'x')::int + 1
extracts the current value of the x key, converting it to an integer, increments it by one and builds a new JSON object with the key x that is also appended to the old value, thus replacing the existing key.
Online example

Related

Concatenate Values of all Elements of An Array of Maps in Spark SQL

I am new to Spark Sql and I have a column of type array with data like below :
[{"X":"A11"},{"X":"A12"},{"X":"A13"}]
The output I am looking for is a string field as
A11, A12, A13
I cannot explode the array as I need the data in one row.
Since the maximum length of the array in my case is 6, I got it to work using below case statement.
case
when size(arr)=1 then array_join(map_values(map_concat(arr[0])),',')
when size(arr)=2 then array_join(map_values(map_concat(arr[0],arr[1])),',')
when size(arr)=3 then array_join(map_values(map_concat(arr[0],arr[1],arr[2])),',')
when size(arr)=4 then array_join(map_values(map_concat(arr[0],arr[1],arr[2],arr[3])),',')
when size(arr)=5 then array_join(map_values(map_concat(arr[0],arr[1],arr[2],arr[3],arr[4])),',')
when size(arr)=6 then array_join(map_values(map_concat(arr[0],arr[1],arr[2],arr[3],arr[4],arr[5])),',')
else
null
end
Is there a better way to do this?
Assuming that the source and result columns are col and values respectively, it can be implemented as follows:
data = [
([{"X": "A11"}, {"X": "A12"}, {"X": "A13"}],)
]
df = spark.createDataFrame(data, ['col'])
df = df.withColumn('values', F.array_join(F.flatten(F.transform('col', lambda x: F.map_values(x))), ','))
df.show(truncate=False)

count jsonb array with condition in postgres

I have a postgres database where some column data are stored as follow:
guest_composition
charging_age
[{"a": 1, "b": 1, "c": 1, "children_ages": [10, 5, 2, 0.1]}]
3
[{"a": 1, "b": 1, "c": 1, "children_ages": [2.5, 1, 4]}]
3
i want to go over the children_ages array and to return the count of children that are above the age of 3. I am having a hard time to use the array data because it is returns as jsonb and not int array.
the first row should return 2 because there are 2 children above the age of 3. The second row should return 1 because there is 1 child above the age of 3.
I have tried the following but it didn't work:
WITH reservation AS (SELECT jsonb_array_elements(reservations.guest_composition)->'children_ages' as children_ages, charging_age FROM reservations
SELECT (CASE WHEN (reservations.charging_age IS NOT NULL AND reservation.children_ages IS NOT NULL) THEN SUM( CASE WHEN (reservation.children_ages)::int[] >=(reservations.charging_age)::int THEN 1 ELSE 0 END) ELSE 0 END) as children_to_charge
You can extract an array of all child ages using a SQL JSON path function:
select jsonb_path_query_array(r.guest_composition, '$[*].children_ages[*] ? (# > 3)')
from reservations r;
The length of that array is then the count you are looking for:
select jsonb_array_length(jsonb_path_query_array(r.guest_composition, '$[*].children_ages[*] ? (# > 3)'))
from reservations r;
It's unclear to me if charging_age is a column and could change in every row. If that is the case, you can pass a parameter to the JSON path function:
select jsonb_path_query_array(
r.guest_composition, '$[*].children_ages[*] ? (# > $age)',
jsonb_build_object('age', charging_age)
)
from reservations r;

optimize sum array array of objects by key in postgresql

I need a postgresql function that merge and sum (by key) 4 jsonb array of objects. Each jsonb can have 0 or multiple objects:
parameter 1: [ {"a": 1.0}, {"b": 2.5} ]
parameter 2: [ {"a": 1.0} ]
parameter 3: [ {"a": 1.0}, {"c": 2.5} ]
parameter 4: [ {"a": 1.0}, {"b": 2.5} ]
and the result expected is this:
[{"a": 4.0}, {"b": 5}, {"c": 2.5}]
I have a function that actually does that. But it's performance is very poor. I need to call it for each row. At this moment, with 1.4 millions of rows, the difference adding the function call is from 39 sec, to 2 min 30 sec. we expect to raise about more than 50 million of results, and that will be about 1 hour 40 min aprox.
I'm really new to postgresql and this is the best function I can do. I don't know if there is a more effective way to do this.
this is my actual function.
create or replace function join_and_sum(parameter1 jsonb, parameter2 jsonb, parameter3 jsonb, parameter4 jsonb) returns jsonb
language plpgsql
as
$$
DECLARE
column_jsonb jsonb;
BEGIN
select into column_jsonb jsonb_agg(p.jsonb_build_object)
from (
SELECT jsonb_build_object(key, SUM(value::float))
FROM (
SELECT (JSONB_EACH_TEXT(j)).*
from jsonb_array_elements( parameter1 || parameter2 || parameter3 || parameter4) j
) j
group by j.key
) p;
RETURN column_jsonb;
END;
$$;
thanks in advance

Moving PostgreSQL bigint array unique value to another index

How can I move the array bigint value from one index to another? For example, I have an array ARRAY[1, 2, 3, 4] of bigint unique values and want to move value 1 to index 3 so the final result would be ARRAY[2, 3, 1, 4]
The assumptions:
Element in the array identified by the value.
The uniqueness of the elements guaranteed.
Any element can be moved to any place.
Null values not involved on any side.
The value is contained in the array if not we have 2 options. First is do nothing and second handling of this exception by exception mechanism. It's an extreme case that can happen only because of some BUG
Arrays are 1-dimensional.
General assumptions:
Array elements are UNIQUE NOT NULL.
Arrays are 1-dimensional with standard subscripts (1..N). See:
Normalize array subscripts for 1-dimensional array so they start with 1
Simple solution
CREATE FUNCTION f_array_move_element_simple(_arr bigint[], _elem bigint, _pos int)
RETURNS bigint[] LANGUAGE sql IMMUTABLE AS
'SELECT a1[:_pos-1] || _elem || a1[_pos:] FROM array_remove(_arr, _elem) a1'
All fine & dandy, as long as:
The given element is actually contained in the array.
The given position is between 1 and the length of the array.
Proper solution
CREATE FUNCTION f_array_move_element(_arr ANYARRAY, _elem ANYELEMENT, _pos int)
RETURNS ANYARRAY AS
$func$
BEGIN
IF _pos IS NULL OR _pos < 1 THEN
RAISE EXCEPTION 'Target position % not allowed. Must be a positive integer.', _pos;
ELSIF _pos > array_length(_arr, 1) THEN
RAISE EXCEPTION 'Target position % not allowed. Cannot be greater than length of array.', _pos;
END IF;
CASE array_position(_arr, _elem) = _pos -- already in position, return org
WHEN true THEN
RETURN _arr;
WHEN false THEN -- remove element
_arr := array_remove(_arr, _elem);
ELSE -- element not found
RAISE EXCEPTION 'Element % not contained in array %!', _elem, _arr;
END CASE;
RETURN _arr[:_pos-1] || _elem || _arr[_pos:];
END
$func$ LANGUAGE plpgsql IMMUTABLE;
Exceptions are raised if any of the additional assumptions for the simple func are violated.
The "proper" function uses polymorphic types and works for any data type, not just bigint - as long as array and element type match.
db<>fiddle here
Postgresql supports slicing and appending so:
SELECT c, c[2:3] || c[1] || c[4:] AS result
FROM (SELECT ARRAY[1, 2, 3, 4] c) sub
db<>fiddle demo
Another variant using 'WITH .. SELECT ..' avoid searching elements by value, just array element numbers. jsonb[] with big jsons for example.
test_model.data - field to update.
:idx_from, :idx_to - placeholders, 1-based.
WITH from_removed AS (
SELECT
test_model.id,
ARRAY_CAT(
data[: :idx_from - 1],
data[:idx_from + 1 :]
) AS "d"
FROM test_model
)
UPDATE test_model AS source
SET data =
from_removed.d[: :idx_to - 1] ||
data[:idx_from : :idx_from] ||
from_removed.d[:idx_to :]
FROM from_removed
WHERE source.id = from_removed.id AND source.id = :id

how to ascending sort a multiple array of SPARK RDD by any column in scala?

I'm interested in apache SPARK.
I tried to ascending sort a multiple array of SPARK RDD by any column in scala.
(i.e. RDD[Array[Int] -> Array(Array(1,2,3), Array(2,3,4), Array(1,2,1))
If I sort by first column, then result will be Array(Array(1,2,3), Array(1,2,1), Array(2,3,4)). or If I sort by third column, then result will be Array(Array(1,2,3), Array(1,2,3), Array(2,3,4)).
)
and then, I want to get RDD[Array[Int]] return-type value.
Is there a method to solve it, whether using map() or filter() function?
Use RDD.sortBy:
// sorting by second column (index = 1)
val result: RDD[Array[Int]] = rdd.sortBy(_(1), ascending = true)
The sorting function can also be written using Pattern Matching:
val result: RDD[Array[Int]] = rdd.sortBy( {
case Array(a, b, c) => b /* choose column(s) to sort by */
}, ascending = true)
Also note the ascending argument's default value is true, so you can drop it and get the same result:
val result: RDD[Array[Int]] = rdd.sortBy(_(1))
val baseRdd = sc.parallelize(Array(Array(1, 2, 3), Array(2, 3, 4), Array(1, 2, 1)))
//False specifies desending order
val result = baseRdd.sortBy(x => x(1), false)
result.foreach { x => println(x(0) + "\t" + x(1) + "\t" + x(2)) }
Result
2 3 4
1 2 3
1 2 1

Resources