PostgreSQL json and array processing

PostgreSQL json and array processing - arrays

I need to output json out from the query.
Input data:
Documents:
==========
id | name | team
------------------
1 | doc1 | {"authors": [1, 2, 3], "editors": [3, 4, 5]}
Persons:
========
id | name |
--------------
1 | Person1 |
2 | Person2 |
3 | Person3 |
4 | Person4 |
5 | Person5 |
Query:
select d.id, d.name,
(select jsonb_build_object(composed)
from
(
select teamGrp.key,
(
select json_build_array(persAgg) from
(
select
(
select jsonb_agg(pers) from
(
select person.id, person.name
from
persons
where (persList.value)::int=person.id
) pers
)
from
json_array_elements_text(teamGrp.value::json) persList
) persAgg
)
from
jsonb_each_text(d.team) teamGrp
) teamed
) as teams
from
documents d;
and i expect the following output:
{"id": 1, "name": "doc1", "teams":
{"authors": [{"id": 1, "name": "Person1"}, {"id": 2, "name": "Person2"}, {"id": 3, "name": "Person3"}],
"editors": [{"id": 3, "name": "Person3"}, {"id": 5, "name": "Person5"}, {"id": 5, "name": "Person5"}]}
But received an error:
ERROR: more than one row returned by a subquery used as an expression
Where is the problem and how to fix it?
PostgreSQL 9.5

I think the following (super complicated query) should to it:
SELECT
json_build_object(
'id',id,
'name',name,
'teams',(
SELECT json_object_agg(team_name,
(SELECT
json_agg(json_build_object('id',value,'name',Persons.name))
FROM json_array_elements(team_members)
INNER JOIN Persons ON (value#>>'{}')::integer=Persons.id
)
)
FROM json_each(team) t(team_name,team_members)
)
)
FROM Documents;
I am using subqueries where I run json aggregates.

Related

Postgres aggregate nested jsonb array values

In Postgres 11.x I am trying to aggregate elements in a nested jsonb object which has an array field into a single row per device_id. Here's example data for a table called configurations.
id
device_id
data
1
1
"{""sensors"": [{""other_data"": {}, ""sensor_type"": 1}], ""other_data"": {}}"
2
1
"{""sensors"": [{""other_data"": {}, ""sensor_type"": 1}, {""other_data"": {}, ""sensor_type"": 2}], ""other_data"": {}}"
3
1
"{""sensors"": [{""other_data"": {}, ""sensor_type"": 3}], ""other_data"": {}}"
4
2
"{""sensors"": [{""other_data"": {}, ""sensor_type"": 4}], ""other_data"": {}}"
5
2
"{""sensors"": null, ""other_data"": {}}"
6
3
"{""sensors"": [], ""other_data"": {}}"
My goal output would have a single row per device_id with an array of distinct sensor_types, example:
device_id
sensor_types
1
[1,2,3]
2
[4]
3
[ ] null would also be fine here
Tried a bunch of things but running into various problems, here's some SQL to set up a test environment:
CREATE TEMPORARY TABLE configurations(
id SERIAL PRIMARY KEY,
device_id SERIAL,
data JSONB
);
INSERT INTO configurations(device_id, data) VALUES
(1, '{ "other_data": {}, "sensors": [ { "sensor_type": 1, "other_data": {} } ] }'),
(1, '{ "other_data": {}, "sensors": [ { "sensor_type": 1, "other_data": {} }, { "sensor_type": 2, "other_data": {} }] }'),
(1, '{ "other_data": {}, "sensors": [ { "sensor_type": 3, "other_data": {} }] }'),
(2, '{ "other_data": {}, "sensors": [ { "sensor_type": 4, "other_data": {} }] }'),
(2, '{ "other_data": {}, "sensors": null }'),
(3, '{ "other_data": {}, "sensors": [] }');
Quick note, my real table has about 100,000 rows and the jsonb data is much more complicated but follows this general structure.

The JSONB null causes some problems in Postgres and should rather be avoided when possible. You can convert the value to an empty array with the expression
coalesce(nullif(data->'sensors', 'null'), '[]')
The first attempt:
select device_id, array_agg(distinct value->'sensor_type') as sensor_types
from configurations
left join jsonb_array_elements(coalesce(nullif(data->'sensors', 'null'), '[]')) on true
group by device_id;
device_id | sensor_types
-----------+--------------
1 | {1,2,3}
2 | {4,NULL}
3 | {NULL}
(3 rows)
may be unsatisfactory because of nulls in the result. When trying to remove them
select device_id, array_agg(distinct value->'sensor_type') as sensor_types
from configurations
left join jsonb_array_elements(coalesce(nullif(data->'sensors', 'null'), '[]')) on true
where value is not null
group by device_id;
device_id | sensor_types
-----------+--------------
1 | {1,2,3}
2 | {4}
(2 rows)
device_id = 3 disappears. Well, we can get all device_ids from the table:
select distinct device_id, sensor_types
from configurations
left join (
select device_id, array_agg(distinct value->'sensor_type') as sensor_types
from configurations
left join jsonb_array_elements(coalesce(nullif(data->'sensors', 'null'), '[]')) on true
where value is not null
group by device_id
) s
using(device_id);
device_id | sensor_types
-----------+--------------
1 | {1,2,3}
2 | {4}
3 |
(3 rows)

Postgresql array to json

I have a table like:
id
time_serie
value
1
2020-09-25 00:00:00
100
1
2020-09-25 00:10:00
200
1
2020-09-25 00:20:00
300
1
2020-09-25 00:30:00
400
I want a JSON output as:
{
"ID": 1,
"time_serie": [
{
"position": 1,
"inQuantity": 100
},
{
"position": 2,
"inQuantity": 200
},
{
"position": 3,
"inQuantity": 300
},
{
"position": 4,
"inQuantity": 400
}
...
]
}
Thanks

You can use a mix of JSON functions along with ROW_NUMBER() window function in order to generate positions such as
SELECT *
FROM
(
SELECT JSON_BUILD_OBJECT('ID', id,
'time_serie',
JSON_AGG(
JSON_BUILD_OBJECT('position',id,'inQuantity',value)
)
)
FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY time) AS rn FROM t) AS t
GROUP BY id
) AS j
Demo

How to merge two json arrays by property id (SQL Server)

I have two columns of json that I would like to join on id into a single select.
Sample Data
| a | b |
+------------------------------------------------+-------------------------------------+
| [{id: 1, name: "Alice"},{id:2, name: "Bob"}] | [{id: 1, age: 30}, {id:2, age: 32}] |
| [{id: 5, name: "Charlie"},{id:6, name: "Dale"} | [{id: 5, age: 20}, {id:6, age: 14}] |
Desired Output
| c |
+-------------------------------------------------------------------+
| [{id: 1, name: "Alice", age: 30},{id:2, name: "Bob", age: 32}] |
| [{id: 5, name: "Charlie", age: 20},{id:6, name: "Dale", age: 14}] |
I'd like to do something like
select
id,
name,
age
from openJson(select a from someDb) sd
with (
id int '$.id',
age int '$.age'
)
inner join (
select
id,
age
from openJson(select b from someDb)
with (
id int '$.id',
age int '$.name'
)
) x
on x.id = sd.id

I don't think that current versions of SQL Server support a MERGE function. The only option is JSON_MODIFY() function, that can either update the value of an existing property, insert a new key:value pair or delete a key.
But in your case, the more appropriate approach is to parse the stored JSON as tables using OPENJSON() with explicit schema, join the tables and rebuild the required JSON output again:
SELECT
c = (
SELECT a.id, a.name, b.age
FROM OPENJSON(v.a) WITH (
id int '$.id',
name varchar(50) '$.name'
) a
FULL JOIN OPENJSON(v.b) WITH (
id int '$.id',
age int '$.age'
) b ON a.id = b.id
FOR JSON PATH
)
FROM (VALUES
('[{"id": 1, "name": "Alice"}, {"id":2, "name": "Bob"}]', '[{"id": 1, "age": 30}, {"id":2, "age": 32}]'),
('[{"id": 5, "name": "Charlie"}, {"id":6, "name": "Dale"}]', '[{"id": 5, "age": 20}, {"id":6, "age": 14}]')
) v (a, b)
Result:
c
--------------------------------------------------------------------
[{"id":1,"name":"Alice","age":30},{"id":2,"name":"Bob","age":32}]
[{"id":5,"name":"Charlie","age":20},{"id":6,"name":"Dale","age":14}]

Solr DIH: documents are not nested

I'm trying to import some data as nested documents.
I've tried to simplify my problem as much as possible.
Here my parent query:
SELECT 1 AS id,
'xxx' AS col1,
'yyy' AS col2
UNION
SELECT 2 AS id,
'xxx' AS col1,
'yyy' AS col2
UNION
SELECT 3 AS id,
'xxx' AS col1,
'yyy' AS col2
You can see data here:
1 | xxx | yyy
2 | xxx | yyy
3 | xxx | yyy
My child query is:
SELECT 1 AS id,
1 AS rel_id,
'aaa' AS col1,
'bbb' AS col2
UNION
SELECT 2 AS id,
1 AS rel_id,
'aaa' AS col1,
'bbb' AS col2
UNION
SELECT 3 AS id,
2 AS rel_id,
'aaa' AS col1,
'bbb' AS col2
UNION
SELECT 4 AS id,
3 AS rel_id,
'aaa' AS col1,
'bbb' AS col2
Data is:
1 | 1 | aaa | bbb
2 | 1 | aaa | bbb
3 | 2 | aaa | bbb
4 | 3 | aaa | bbb
I'm trying to nest using this DIH configuration:
<entity
name="item"
query="select 1 as id, 'xxx' as col1, 'yyy' as col2 union select 2 as id, 'xxx' as col1, 'yyy' as col2 union select 3 as id, 'xxx' as col1, 'yyy' as col2">
<field column="id" name="id"/>
<field column="col1" name="column1_s" />
<field column="col2" name="column2_s" />
<entity
name="autor"
child="true"
query="select 1 as id, 1 as rel_id, 'aaa' as col1, 'bbb' as col2 union select 2 as id, 1 as rel_id, 'aaa' as col1, 'bbb' as col2 union select 3 as id, 2 as rel_id, 'aaa' as col1, 'bbb' as col2 union select 4 as id, 3 as rel_id, 'aaa' as col1, 'bbb' as col2"
cacheKey="rel_id" cacheLookup="item.id" cacheImpl="SortedMapBackedCache">
<field column="node_type" template="autor"/>
<field column="alt_code" name="id" template="${autor.id}-${autor.rel_id}"/>
<field column="col1" name="column1_s" />
<field column="col2" name="column2_s" />
</entity>
</entity>
However, they are not nested:
$ curl "http://localhost:8983/solr/arxius/select?q=*%3A*"
{
"responseHeader": {
"status": 0,
"QTime": 0,
"params": {
"q": "*:*"
}
},
"response": {
"numFound": 7,
"start": 0,
"numFoundExact": true,
"docs": [
{
"id": "1",
"column2_s": "bbb",
"column1_s": "aaa",
"_version_": 1682901366056419300
},
{
"id": "2",
"column2_s": "bbb",
"column1_s": "aaa",
"_version_": 1682901366056419300
},
{
"id": "1",
"column2_s": "yyy",
"column1_s": "xxx",
"_version_": 1682901366056419300
},
{
"id": "3",
"column2_s": "bbb",
"column1_s": "aaa",
"_version_": 1682901366058516500
},
{
"id": "2",
"column2_s": "yyy",
"column1_s": "xxx",
"_version_": 1682901366058516500
},
{
"id": "4",
"column2_s": "bbb",
"column1_s": "aaa",
"_version_": 1682901366058516500
},
{
"id": "3",
"column2_s": "yyy",
"column1_s": "xxx",
"_version_": 1682901366058516500
}
]
}
}
As you can see, documents are not nested.
I've been struggling a lot over that issue.
I've tried to strightforward the problem.
I hope I've explained so well.
Please, any ideas?

Unnesting a list of JSON objects in PostgreSQL

I have a TEXT column in my PostgreSQL (9.6) database containing a list of one or more dictionnaries, like those ones.
[{"line_total_excl_vat": "583.3300", "account": "", "subtitle": "", "product_id": 5532548, "price_per_unit": "583.3333", "line_total_incl_vat": "700.0000", "text": "PROD0008", "amount": "1.0000", "vat_rate": "20"}]
or
[{"line_total_excl_vat": "500.0000", "account": "", "subtitle": "", "product_id": "", "price_per_unit": "250.0000", "line_total_incl_vat": "600.0000", "text": "PROD003", "amount": "2.0000", "vat_rate": "20"}, {"line_total_excl_vat": "250.0000", "account": "", "subtitle": "", "product_id": 5532632, "price_per_unit": "250.0000", "line_total_incl_vat": "300.0000", "text": "PROD005", "amount": "1.0000", "vat_rate": "20"}]
I would like to retrieve each dictionnary from the column and parse them in different columns.
For this example:
id | customer | blurb
---+----------+------
1 | Joe | [{"line_total_excl_vat": "583.3300", "account": "", "subtitle": "", "product_id": 5532548, "price_per_unit": "583.3333", "line_total_incl_vat": "700.0000", "text": "PROD0008", "amount": "1.0000", "vat_rate": "20"}]
2 | Sally | [{"line_total_excl_vat": "500.0000", "account": "", "subtitle": "", "product_id": "", "price_per_unit": "250.0000", "line_total_incl_vat": "600.0000", "text": "PROD003", "amount": "2.0000", "vat_rate": "20"}, {"line_total_excl_vat": "250.0000", "account": "", "subtitle": "", "product_id": 5532632, "price_per_unit": "250.0000", "line_total_incl_vat": "300.0000", "text": "PROD005", "amount": "1.0000", "vat_rate": "20"}]
would become:
id | customer | line_total_excl_vat | account | product_id | ...
---+----------+----------------------+---------+------------
1 | Joe | 583.3300 | null| 5532548
2 | Sally | 500.0000 | null| null
3 | Sally | 250.0000 | null| 5532632

if you know beforehand what fields you want to extract, cast the text to json / jsonb & use json_to_recordset / jsonb_to_recordset. Note that this method requires the fields names / types to be explicitly be specified. Unspecified fields that are in the json dictionaries will not be extracted.
See official postgesql documentation on json-functions
self contained example:
WITH tbl (id, customer, dat) as ( values
(1, 'Joe',
'[{ "line_total_excl_vat": "583.3300"
, "account": ""
, "subtitle": ""
, "product_id": 5532548
, "price_per_unit": "583.3333"
, "line_total_incl_vat": "700.0000"
, "text": "PROD0008"
, "amount": "1.0000"
, "vat_rate": "20"}]')
,(2, 'Sally',
'[{ "line_total_excl_vat": "500.0000"
, "account": ""
, "subtitle": ""
, "product_id": ""
, "price_per_unit": "250.0000"
, "line_total_incl_vat": "600.0000"
, "text": "PROD003"
, "amount": "2.0000"
, "vat_rate": "20"}
, { "line_total_excl_vat": "250.0000"
, "account": ""
, "subtitle": ""
, "product_id": 5532632
, "price_per_unit": "250.0000"
, "line_total_incl_vat": "300.0000"
, "text": "PROD005"
, "amount": "1.0000"
, "vat_rate": "20"}]')
)
SELECT id, customer, x.*
FROM tbl
, json_to_recordset(dat::json) x
( line_total_excl_vat numeric
, acount text
, subtitle text
, product_id text
, price_per_unit numeric
, line_total_incl_vat numeric
, "text" text
, amount numeric
, vat_rate numeric
)
produces the following output:
id customer line_total_excl_vat acount subtitle product_id price_per_unit line_total_incl_vat text amount vat_rate
1 Joe 583.33 5532548 583.3333 700 PROD0008 1 20
2 Sally 500 250 600 PROD003 2 20
2 Sally 250 5532632 250 300 PROD005 1 20
This format is often referred to as the wide format.
It is also possible to extract the data in a long format, which has the additional benefit that it keeps all the data without explicitly mentioning the field names. In this case, the query may be written as (the test data is elided for brevity)
SELECT id, customer, y.key, y.value, x.record_number
FROM tbl
, lateral json_array_elements(dat::json) WITH ORDINALITY AS x (val, record_number)
, lateral json_each_text(x.val) y
The with ordinality in the above statement adds a sequence number for each element in the unnested array, and is be used to disambiguate fields from different arrays for each customer.
This produced the output:
id customer key value record_number
1 Joe line_total_excl_vat 583.3300 1
1 Joe account 1
1 Joe subtitle 1
1 Joe product_id 5532548 1
1 Joe price_per_unit 583.3333 1
1 Joe line_total_incl_vat 700.0000 1
1 Joe text PROD0008 1
1 Joe amount 1.0000 1
1 Joe vat_rate 20 1
2 Sally line_total_excl_vat 500.0000 1
2 Sally account 1
2 Sally subtitle 1
2 Sally product_id 1
2 Sally price_per_unit 250.0000 1
2 Sally line_total_incl_vat 600.0000 1
2 Sally text PROD003 1
2 Sally amount 2.0000 1
2 Sally vat_rate 20 1
2 Sally line_total_excl_vat 250.0000 2
2 Sally account 2
2 Sally subtitle 2
2 Sally product_id 5532632 2
2 Sally price_per_unit 250.0000 2
2 Sally line_total_incl_vat 300.0000 2
2 Sally text PROD005 2
2 Sally amount 1.0000 2
2 Sally vat_rate 20 2

Tidying up the json field would help a little bit. And that's something it could be done before inserting data into the table.
However, following your example, the code below should work:
create table public.yourtable (id integer, name varchar, others varchar);
insert into public.yourtable select 1,'Joe','[{"line_total_excl_vat": "583.3300", "account": "", "subtitle": "", "product_id": 5532548, "price_per_unit": "583.3333", "line_total_incl_vat": "700.0000", "text": "PROD0008", "amount": "1.0000", "vat_rate": "20"}]';
insert into public.yourtable select 2,'Sally','[{"line_total_excl_vat": "500.0000", "account": "", "subtitle": "", "product_id": "", "price_per_unit": "250.0000", "line_total_incl_vat": "600.0000", "text": "PROD003", "amount": "2.0000", "vat_rate": "20"}, {"line_total_excl_vat": "250.0000", "account": "", "subtitle": "", "product_id": 5532632, "price_per_unit": "250.0000", "line_total_incl_vat": "300.0000", "text": "PROD005", "amount": "1.0000", "vat_rate": "20"}]';
with jsonb_table as (
select id, name,
('{'||regexp_replace(
unnest(string_to_array(others, '}, {')),
'\[|\]|\{|\}','','g')::varchar||'}')::jsonb as jsonb_data
from yourtable
)
select id,name, * from jsonb_table,
jsonb_to_record(jsonb_data)
as (line_total_excl_vat numeric,account varchar, subtitle varchar, product_id varchar, price_per_unit numeric, line_total_incl_vat numeric);
First we create the jsonb_table where we transform your dictionary field into a postgres jsonb field by:
1) converting the string into an array by splitting in the '}, {' character sequence
2) unnesting the array elements to rows
3) cleanning up '[]{}' characters and converting the string to jsonb
And then we make use of the jsonb_to_record function to convert the jsonb records into columns. There we have to specify as many fields as needed for the column definitions.