Solr DIH: documents are not nested - solr

I'm trying to import some data as nested documents.
I've tried to simplify my problem as much as possible.
Here my parent query:
SELECT 1 AS id,
'xxx' AS col1,
'yyy' AS col2
UNION
SELECT 2 AS id,
'xxx' AS col1,
'yyy' AS col2
UNION
SELECT 3 AS id,
'xxx' AS col1,
'yyy' AS col2
You can see data here:
1 | xxx | yyy
2 | xxx | yyy
3 | xxx | yyy
My child query is:
SELECT 1 AS id,
1 AS rel_id,
'aaa' AS col1,
'bbb' AS col2
UNION
SELECT 2 AS id,
1 AS rel_id,
'aaa' AS col1,
'bbb' AS col2
UNION
SELECT 3 AS id,
2 AS rel_id,
'aaa' AS col1,
'bbb' AS col2
UNION
SELECT 4 AS id,
3 AS rel_id,
'aaa' AS col1,
'bbb' AS col2
Data is:
1 | 1 | aaa | bbb
2 | 1 | aaa | bbb
3 | 2 | aaa | bbb
4 | 3 | aaa | bbb
I'm trying to nest using this DIH configuration:
<entity
name="item"
query="select 1 as id, 'xxx' as col1, 'yyy' as col2 union select 2 as id, 'xxx' as col1, 'yyy' as col2 union select 3 as id, 'xxx' as col1, 'yyy' as col2">
<field column="id" name="id"/>
<field column="col1" name="column1_s" />
<field column="col2" name="column2_s" />
<entity
name="autor"
child="true"
query="select 1 as id, 1 as rel_id, 'aaa' as col1, 'bbb' as col2 union select 2 as id, 1 as rel_id, 'aaa' as col1, 'bbb' as col2 union select 3 as id, 2 as rel_id, 'aaa' as col1, 'bbb' as col2 union select 4 as id, 3 as rel_id, 'aaa' as col1, 'bbb' as col2"
cacheKey="rel_id" cacheLookup="item.id" cacheImpl="SortedMapBackedCache">
<field column="node_type" template="autor"/>
<field column="alt_code" name="id" template="${autor.id}-${autor.rel_id}"/>
<field column="col1" name="column1_s" />
<field column="col2" name="column2_s" />
</entity>
</entity>
However, they are not nested:
$ curl "http://localhost:8983/solr/arxius/select?q=*%3A*"
{
"responseHeader": {
"status": 0,
"QTime": 0,
"params": {
"q": "*:*"
}
},
"response": {
"numFound": 7,
"start": 0,
"numFoundExact": true,
"docs": [
{
"id": "1",
"column2_s": "bbb",
"column1_s": "aaa",
"_version_": 1682901366056419300
},
{
"id": "2",
"column2_s": "bbb",
"column1_s": "aaa",
"_version_": 1682901366056419300
},
{
"id": "1",
"column2_s": "yyy",
"column1_s": "xxx",
"_version_": 1682901366056419300
},
{
"id": "3",
"column2_s": "bbb",
"column1_s": "aaa",
"_version_": 1682901366058516500
},
{
"id": "2",
"column2_s": "yyy",
"column1_s": "xxx",
"_version_": 1682901366058516500
},
{
"id": "4",
"column2_s": "bbb",
"column1_s": "aaa",
"_version_": 1682901366058516500
},
{
"id": "3",
"column2_s": "yyy",
"column1_s": "xxx",
"_version_": 1682901366058516500
}
]
}
}
As you can see, documents are not nested.
I've been struggling a lot over that issue.
I've tried to strightforward the problem.
I hope I've explained so well.
Please, any ideas?

Related

snowflake, how to combine all result into json when doing a group by

I have a table with values like this
table-1
id
value-1
value-2
id-1
test
{"id": "value","other": "this"}
id-2
foo-1
{"id": "value","other": "this"}
id-2
foo-2
{"id": "value1","other": "this"}
I want to be able to group by column id and produce results like
id
json_value
id-1
[{"value-1": "foo-1","value-2": {"id": "ran","other": "this"}}]
id-2
[{"value-1":"foo-1","value-2":{"id":"value1","other":"this"}},{"value-1":"foo-2","value-2":{"id":"value","other":"this"}}]
Which is to collect all columns for the grouped id and convert each row to json using column name as key and column value as the actual value.
You can use object_construct and array_agg within group to do this:
create or replace temp table T1 as
select
COLUMN1::string as id,
COLUMN2::string as "value-1",
COLUMN3::string as "value-2"
from (values
('id-1','test','{"id": "value","other": "this"}'),
('id-2','foo-1','{"id": "value","other": "this"}'),
('id-2','foo-2','{"id": "value1","other": "this"}')
);
select id
,array_agg(object_construct('value-1', "value-1", 'value-2', parse_json("value-2")))
within group (order by ID) as JSON_VALUE
from T1
group by ID
;
ID
JSON_VALUE
id-1
[ { "value-1": "test", "value-2": { "id": "value", "other": "this" } } ]
id-2
[ { "value-1": "foo-1", "value-2": { "id": "value", "other": "this" } }, { "value-1": "foo-2", "value-2": { "id": "value1", "other": "this" } } ]

Postgresql array to json

I have a table like:
id
time_serie
value
1
2020-09-25 00:00:00
100
1
2020-09-25 00:10:00
200
1
2020-09-25 00:20:00
300
1
2020-09-25 00:30:00
400
I want a JSON output as:
{
"ID": 1,
"time_serie": [
{
"position": 1,
"inQuantity": 100
},
{
"position": 2,
"inQuantity": 200
},
{
"position": 3,
"inQuantity": 300
},
{
"position": 4,
"inQuantity": 400
}
...
]
}
Thanks
You can use a mix of JSON functions along with ROW_NUMBER() window function in order to generate positions such as
SELECT *
FROM
(
SELECT JSON_BUILD_OBJECT('ID', id,
'time_serie',
JSON_AGG(
JSON_BUILD_OBJECT('position',id,'inQuantity',value)
)
)
FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY time) AS rn FROM t) AS t
GROUP BY id
) AS j
Demo

How to join JSON to update multiple rows by primary key

I am trying to update a log with JSON in SQL Server 2017. I can update a data point with json_value, which covers a few cases, but would ultimately like to join in incoming JSON.
Sample table:
key | col_1 | col_2 | col_3
----+-------------------------------+---------------|-----------------
1 | json.lines[0].data.meta.data | json.lines[0] | json.header.note
2 | json.lines[1].data.meta.data} | json.lines[1] | json.header.note
3 | json.lines[2].data.meta.data} | json.lines[2] | json.header.note
I'd like to update a single property in col_1 and update col_2 with an object as as as string.
Sample JSON:
declare #json nvarchar(max) = '[{
header: {
note: 'some note'
}, lines: [{
data {
id: {
key: 0,
name: 'item_1'
},
meta: {
data: 'item_1_data'
}
}, {...}, {...}
}]
}]'
Query:
update logTable set
col_1 = json_value(#json,'$.lines[__index__].data.meta.data'), -- what would the syntax for __index__ be?
col_2 = j.lines[key], -- pseudo code
col_3 = json_value(#json, '$'.header.note')
inner join openjson(#json) j
on json_value(#json,'$.line[?].id.key') = logTable..key -- ? denotes indices that I'd like to iterate = join over
Expected Output:
key | col_1 | col_2 | col_3
----+---------------+----------------------------|---------
1 | 'item_1_data' | 'data: { id: { key: 0...}' | '{header: { note: ...} }'
2 | 'item_2_data' | 'data: { id: { key: 1...}' | '{header: { note: ...} }'
3 | 'item_3_data' | 'data: { id: { key: 2...}' | '{header: { note: ...} }'
I'm not sure how to handle iterating over the $.line indices, but think a join would solve this if properly implemented.
How can I join to arrays of objects to update SQL rows by primary key?
Original answer:
You may try to parse your JSON using OPENJSON with explicit schema (note, that your JSON is not valid):
Table and JSON:
CREATE TABLE #Data (
[key] int,
col_1 nvarchar(100),
col_2 nvarchar(max)
)
INSERT INTO #Data
([key], [col_1], [col_2])
VALUES
(1, N'', N''),
(2, N'', N''),
(3, N'', N'')
DECLARE #json nvarchar(max) = N'[{
"lines": [
{
"data": {
"id": {
"key": 1,
"name": "item_1"
},
"meta": {
"data": "item_1_data"
}
}
},
{
"data": {
"id": {
"key": 2,
"name": "item_2"
},
"meta": {
"data": "item_2_data"
}
}
},
{
"data": {
"id": {
"key": 3,
"name": "item_3"
},
"meta": {
"data": "item_3_data"
}
}
}
]
}]'
Statement:
UPDATE #Data
SET
col_1 = j.metadata,
col_2 = j.data
FROM #Data
INNER JOIN (
SELECT *
FROM OPENJSON(#json, '$[0].lines') WITH (
[key] int '$.data.id.key',
metadata nvarchar(100) '$.data.meta.data',
data nvarchar(max) '$' AS JSON
)
) j ON #Data.[key] = j.[key]
Update:
Header is common for all rows, so use JSON_QUERY() to update the table:
Table and JSON:
CREATE TABLE #Data (
[key] int,
col_1 nvarchar(100),
col_2 nvarchar(max),
col_3 nvarchar(max)
)
INSERT INTO #Data
([key], col_1, col_2, col_3)
VALUES
(1, N'', N'', N''),
(2, N'', N'', N''),
(3, N'', N'', N'')
DECLARE #json nvarchar(max) = N'[{
"header": {
"note": "some note"
},
"lines": [
{
"data": {
"id": {
"key": 1,
"name": "item_1"
},
"meta": {
"data": "item_1_data"
}
}
},
{
"data": {
"id": {
"key": 2,
"name": "item_2"
},
"meta": {
"data": "item_2_data"
}
}
},
{
"data": {
"id": {
"key": 3,
"name": "item_3"
},
"meta": {
"data": "item_3_data"
}
}
}
]
}]'
Statement:
UPDATE #Data
SET
col_1 = j.metadata,
col_2 = j.data,
col_3 = JSON_QUERY(#json, '$[0].header')
FROM #Data
INNER JOIN (
SELECT *
FROM OPENJSON(#json, '$[0].lines') WITH (
[key] int '$.data.id.key',
metadata nvarchar(100) '$.data.meta.data',
data nvarchar(max) '$' AS JSON
)
) j ON #Data.[key] = j.[key]

Unnesting a list of JSON objects in PostgreSQL

I have a TEXT column in my PostgreSQL (9.6) database containing a list of one or more dictionnaries, like those ones.
[{"line_total_excl_vat": "583.3300", "account": "", "subtitle": "", "product_id": 5532548, "price_per_unit": "583.3333", "line_total_incl_vat": "700.0000", "text": "PROD0008", "amount": "1.0000", "vat_rate": "20"}]
or
[{"line_total_excl_vat": "500.0000", "account": "", "subtitle": "", "product_id": "", "price_per_unit": "250.0000", "line_total_incl_vat": "600.0000", "text": "PROD003", "amount": "2.0000", "vat_rate": "20"}, {"line_total_excl_vat": "250.0000", "account": "", "subtitle": "", "product_id": 5532632, "price_per_unit": "250.0000", "line_total_incl_vat": "300.0000", "text": "PROD005", "amount": "1.0000", "vat_rate": "20"}]
I would like to retrieve each dictionnary from the column and parse them in different columns.
For this example:
id | customer | blurb
---+----------+------
1 | Joe | [{"line_total_excl_vat": "583.3300", "account": "", "subtitle": "", "product_id": 5532548, "price_per_unit": "583.3333", "line_total_incl_vat": "700.0000", "text": "PROD0008", "amount": "1.0000", "vat_rate": "20"}]
2 | Sally | [{"line_total_excl_vat": "500.0000", "account": "", "subtitle": "", "product_id": "", "price_per_unit": "250.0000", "line_total_incl_vat": "600.0000", "text": "PROD003", "amount": "2.0000", "vat_rate": "20"}, {"line_total_excl_vat": "250.0000", "account": "", "subtitle": "", "product_id": 5532632, "price_per_unit": "250.0000", "line_total_incl_vat": "300.0000", "text": "PROD005", "amount": "1.0000", "vat_rate": "20"}]
would become:
id | customer | line_total_excl_vat | account | product_id | ...
---+----------+----------------------+---------+------------
1 | Joe | 583.3300 | null| 5532548
2 | Sally | 500.0000 | null| null
3 | Sally | 250.0000 | null| 5532632
if you know beforehand what fields you want to extract, cast the text to json / jsonb & use json_to_recordset / jsonb_to_recordset. Note that this method requires the fields names / types to be explicitly be specified. Unspecified fields that are in the json dictionaries will not be extracted.
See official postgesql documentation on json-functions
self contained example:
WITH tbl (id, customer, dat) as ( values
(1, 'Joe',
'[{ "line_total_excl_vat": "583.3300"
, "account": ""
, "subtitle": ""
, "product_id": 5532548
, "price_per_unit": "583.3333"
, "line_total_incl_vat": "700.0000"
, "text": "PROD0008"
, "amount": "1.0000"
, "vat_rate": "20"}]')
,(2, 'Sally',
'[{ "line_total_excl_vat": "500.0000"
, "account": ""
, "subtitle": ""
, "product_id": ""
, "price_per_unit": "250.0000"
, "line_total_incl_vat": "600.0000"
, "text": "PROD003"
, "amount": "2.0000"
, "vat_rate": "20"}
, { "line_total_excl_vat": "250.0000"
, "account": ""
, "subtitle": ""
, "product_id": 5532632
, "price_per_unit": "250.0000"
, "line_total_incl_vat": "300.0000"
, "text": "PROD005"
, "amount": "1.0000"
, "vat_rate": "20"}]')
)
SELECT id, customer, x.*
FROM tbl
, json_to_recordset(dat::json) x
( line_total_excl_vat numeric
, acount text
, subtitle text
, product_id text
, price_per_unit numeric
, line_total_incl_vat numeric
, "text" text
, amount numeric
, vat_rate numeric
)
produces the following output:
id customer line_total_excl_vat acount subtitle product_id price_per_unit line_total_incl_vat text amount vat_rate
1 Joe 583.33 5532548 583.3333 700 PROD0008 1 20
2 Sally 500 250 600 PROD003 2 20
2 Sally 250 5532632 250 300 PROD005 1 20
This format is often referred to as the wide format.
It is also possible to extract the data in a long format, which has the additional benefit that it keeps all the data without explicitly mentioning the field names. In this case, the query may be written as (the test data is elided for brevity)
SELECT id, customer, y.key, y.value, x.record_number
FROM tbl
, lateral json_array_elements(dat::json) WITH ORDINALITY AS x (val, record_number)
, lateral json_each_text(x.val) y
The with ordinality in the above statement adds a sequence number for each element in the unnested array, and is be used to disambiguate fields from different arrays for each customer.
This produced the output:
id customer key value record_number
1 Joe line_total_excl_vat 583.3300 1
1 Joe account 1
1 Joe subtitle 1
1 Joe product_id 5532548 1
1 Joe price_per_unit 583.3333 1
1 Joe line_total_incl_vat 700.0000 1
1 Joe text PROD0008 1
1 Joe amount 1.0000 1
1 Joe vat_rate 20 1
2 Sally line_total_excl_vat 500.0000 1
2 Sally account 1
2 Sally subtitle 1
2 Sally product_id 1
2 Sally price_per_unit 250.0000 1
2 Sally line_total_incl_vat 600.0000 1
2 Sally text PROD003 1
2 Sally amount 2.0000 1
2 Sally vat_rate 20 1
2 Sally line_total_excl_vat 250.0000 2
2 Sally account 2
2 Sally subtitle 2
2 Sally product_id 5532632 2
2 Sally price_per_unit 250.0000 2
2 Sally line_total_incl_vat 300.0000 2
2 Sally text PROD005 2
2 Sally amount 1.0000 2
2 Sally vat_rate 20 2
Tidying up the json field would help a little bit. And that's something it could be done before inserting data into the table.
However, following your example, the code below should work:
create table public.yourtable (id integer, name varchar, others varchar);
insert into public.yourtable select 1,'Joe','[{"line_total_excl_vat": "583.3300", "account": "", "subtitle": "", "product_id": 5532548, "price_per_unit": "583.3333", "line_total_incl_vat": "700.0000", "text": "PROD0008", "amount": "1.0000", "vat_rate": "20"}]';
insert into public.yourtable select 2,'Sally','[{"line_total_excl_vat": "500.0000", "account": "", "subtitle": "", "product_id": "", "price_per_unit": "250.0000", "line_total_incl_vat": "600.0000", "text": "PROD003", "amount": "2.0000", "vat_rate": "20"}, {"line_total_excl_vat": "250.0000", "account": "", "subtitle": "", "product_id": 5532632, "price_per_unit": "250.0000", "line_total_incl_vat": "300.0000", "text": "PROD005", "amount": "1.0000", "vat_rate": "20"}]';
with jsonb_table as (
select id, name,
('{'||regexp_replace(
unnest(string_to_array(others, '}, {')),
'\[|\]|\{|\}','','g')::varchar||'}')::jsonb as jsonb_data
from yourtable
)
select id,name, * from jsonb_table,
jsonb_to_record(jsonb_data)
as (line_total_excl_vat numeric,account varchar, subtitle varchar, product_id varchar, price_per_unit numeric, line_total_incl_vat numeric);
First we create the jsonb_table where we transform your dictionary field into a postgres jsonb field by:
1) converting the string into an array by splitting in the '}, {' character sequence
2) unnesting the array elements to rows
3) cleanning up '[]{}' characters and converting the string to jsonb
And then we make use of the jsonb_to_record function to convert the jsonb records into columns. There we have to specify as many fields as needed for the column definitions.

PostgreSQL json and array processing

I need to output json out from the query.
Input data:
Documents:
==========
id | name | team
------------------
1 | doc1 | {"authors": [1, 2, 3], "editors": [3, 4, 5]}
Persons:
========
id | name |
--------------
1 | Person1 |
2 | Person2 |
3 | Person3 |
4 | Person4 |
5 | Person5 |
Query:
select d.id, d.name,
(select jsonb_build_object(composed)
from
(
select teamGrp.key,
(
select json_build_array(persAgg) from
(
select
(
select jsonb_agg(pers) from
(
select person.id, person.name
from
persons
where (persList.value)::int=person.id
) pers
)
from
json_array_elements_text(teamGrp.value::json) persList
) persAgg
)
from
jsonb_each_text(d.team) teamGrp
) teamed
) as teams
from
documents d;
and i expect the following output:
{"id": 1, "name": "doc1", "teams":
{"authors": [{"id": 1, "name": "Person1"}, {"id": 2, "name": "Person2"}, {"id": 3, "name": "Person3"}],
"editors": [{"id": 3, "name": "Person3"}, {"id": 5, "name": "Person5"}, {"id": 5, "name": "Person5"}]}
But received an error:
ERROR: more than one row returned by a subquery used as an expression
Where is the problem and how to fix it?
PostgreSQL 9.5
I think the following (super complicated query) should to it:
SELECT
json_build_object(
'id',id,
'name',name,
'teams',(
SELECT json_object_agg(team_name,
(SELECT
json_agg(json_build_object('id',value,'name',Persons.name))
FROM json_array_elements(team_members)
INNER JOIN Persons ON (value#>>'{}')::integer=Persons.id
)
)
FROM json_each(team) t(team_name,team_members)
)
)
FROM Documents;
I am using subqueries where I run json aggregates.

Resources