Unnesting a list of JSON objects in PostgreSQL - database

I have a TEXT column in my PostgreSQL (9.6) database containing a list of one or more dictionnaries, like those ones.
[{"line_total_excl_vat": "583.3300", "account": "", "subtitle": "", "product_id": 5532548, "price_per_unit": "583.3333", "line_total_incl_vat": "700.0000", "text": "PROD0008", "amount": "1.0000", "vat_rate": "20"}]
or
[{"line_total_excl_vat": "500.0000", "account": "", "subtitle": "", "product_id": "", "price_per_unit": "250.0000", "line_total_incl_vat": "600.0000", "text": "PROD003", "amount": "2.0000", "vat_rate": "20"}, {"line_total_excl_vat": "250.0000", "account": "", "subtitle": "", "product_id": 5532632, "price_per_unit": "250.0000", "line_total_incl_vat": "300.0000", "text": "PROD005", "amount": "1.0000", "vat_rate": "20"}]
I would like to retrieve each dictionnary from the column and parse them in different columns.
For this example:
id | customer | blurb
---+----------+------
1 | Joe | [{"line_total_excl_vat": "583.3300", "account": "", "subtitle": "", "product_id": 5532548, "price_per_unit": "583.3333", "line_total_incl_vat": "700.0000", "text": "PROD0008", "amount": "1.0000", "vat_rate": "20"}]
2 | Sally | [{"line_total_excl_vat": "500.0000", "account": "", "subtitle": "", "product_id": "", "price_per_unit": "250.0000", "line_total_incl_vat": "600.0000", "text": "PROD003", "amount": "2.0000", "vat_rate": "20"}, {"line_total_excl_vat": "250.0000", "account": "", "subtitle": "", "product_id": 5532632, "price_per_unit": "250.0000", "line_total_incl_vat": "300.0000", "text": "PROD005", "amount": "1.0000", "vat_rate": "20"}]
would become:
id | customer | line_total_excl_vat | account | product_id | ...
---+----------+----------------------+---------+------------
1 | Joe | 583.3300 | null| 5532548
2 | Sally | 500.0000 | null| null
3 | Sally | 250.0000 | null| 5532632

if you know beforehand what fields you want to extract, cast the text to json / jsonb & use json_to_recordset / jsonb_to_recordset. Note that this method requires the fields names / types to be explicitly be specified. Unspecified fields that are in the json dictionaries will not be extracted.
See official postgesql documentation on json-functions
self contained example:
WITH tbl (id, customer, dat) as ( values
(1, 'Joe',
'[{ "line_total_excl_vat": "583.3300"
, "account": ""
, "subtitle": ""
, "product_id": 5532548
, "price_per_unit": "583.3333"
, "line_total_incl_vat": "700.0000"
, "text": "PROD0008"
, "amount": "1.0000"
, "vat_rate": "20"}]')
,(2, 'Sally',
'[{ "line_total_excl_vat": "500.0000"
, "account": ""
, "subtitle": ""
, "product_id": ""
, "price_per_unit": "250.0000"
, "line_total_incl_vat": "600.0000"
, "text": "PROD003"
, "amount": "2.0000"
, "vat_rate": "20"}
, { "line_total_excl_vat": "250.0000"
, "account": ""
, "subtitle": ""
, "product_id": 5532632
, "price_per_unit": "250.0000"
, "line_total_incl_vat": "300.0000"
, "text": "PROD005"
, "amount": "1.0000"
, "vat_rate": "20"}]')
)
SELECT id, customer, x.*
FROM tbl
, json_to_recordset(dat::json) x
( line_total_excl_vat numeric
, acount text
, subtitle text
, product_id text
, price_per_unit numeric
, line_total_incl_vat numeric
, "text" text
, amount numeric
, vat_rate numeric
)
produces the following output:
id customer line_total_excl_vat acount subtitle product_id price_per_unit line_total_incl_vat text amount vat_rate
1 Joe 583.33 5532548 583.3333 700 PROD0008 1 20
2 Sally 500 250 600 PROD003 2 20
2 Sally 250 5532632 250 300 PROD005 1 20
This format is often referred to as the wide format.
It is also possible to extract the data in a long format, which has the additional benefit that it keeps all the data without explicitly mentioning the field names. In this case, the query may be written as (the test data is elided for brevity)
SELECT id, customer, y.key, y.value, x.record_number
FROM tbl
, lateral json_array_elements(dat::json) WITH ORDINALITY AS x (val, record_number)
, lateral json_each_text(x.val) y
The with ordinality in the above statement adds a sequence number for each element in the unnested array, and is be used to disambiguate fields from different arrays for each customer.
This produced the output:
id customer key value record_number
1 Joe line_total_excl_vat 583.3300 1
1 Joe account 1
1 Joe subtitle 1
1 Joe product_id 5532548 1
1 Joe price_per_unit 583.3333 1
1 Joe line_total_incl_vat 700.0000 1
1 Joe text PROD0008 1
1 Joe amount 1.0000 1
1 Joe vat_rate 20 1
2 Sally line_total_excl_vat 500.0000 1
2 Sally account 1
2 Sally subtitle 1
2 Sally product_id 1
2 Sally price_per_unit 250.0000 1
2 Sally line_total_incl_vat 600.0000 1
2 Sally text PROD003 1
2 Sally amount 2.0000 1
2 Sally vat_rate 20 1
2 Sally line_total_excl_vat 250.0000 2
2 Sally account 2
2 Sally subtitle 2
2 Sally product_id 5532632 2
2 Sally price_per_unit 250.0000 2
2 Sally line_total_incl_vat 300.0000 2
2 Sally text PROD005 2
2 Sally amount 1.0000 2
2 Sally vat_rate 20 2

Tidying up the json field would help a little bit. And that's something it could be done before inserting data into the table.
However, following your example, the code below should work:
create table public.yourtable (id integer, name varchar, others varchar);
insert into public.yourtable select 1,'Joe','[{"line_total_excl_vat": "583.3300", "account": "", "subtitle": "", "product_id": 5532548, "price_per_unit": "583.3333", "line_total_incl_vat": "700.0000", "text": "PROD0008", "amount": "1.0000", "vat_rate": "20"}]';
insert into public.yourtable select 2,'Sally','[{"line_total_excl_vat": "500.0000", "account": "", "subtitle": "", "product_id": "", "price_per_unit": "250.0000", "line_total_incl_vat": "600.0000", "text": "PROD003", "amount": "2.0000", "vat_rate": "20"}, {"line_total_excl_vat": "250.0000", "account": "", "subtitle": "", "product_id": 5532632, "price_per_unit": "250.0000", "line_total_incl_vat": "300.0000", "text": "PROD005", "amount": "1.0000", "vat_rate": "20"}]';
with jsonb_table as (
select id, name,
('{'||regexp_replace(
unnest(string_to_array(others, '}, {')),
'\[|\]|\{|\}','','g')::varchar||'}')::jsonb as jsonb_data
from yourtable
)
select id,name, * from jsonb_table,
jsonb_to_record(jsonb_data)
as (line_total_excl_vat numeric,account varchar, subtitle varchar, product_id varchar, price_per_unit numeric, line_total_incl_vat numeric);
First we create the jsonb_table where we transform your dictionary field into a postgres jsonb field by:
1) converting the string into an array by splitting in the '}, {' character sequence
2) unnesting the array elements to rows
3) cleanning up '[]{}' characters and converting the string to jsonb
And then we make use of the jsonb_to_record function to convert the jsonb records into columns. There we have to specify as many fields as needed for the column definitions.

Related

Laravel 8 how to join many to many relationship tables and create a query using DB

I am using Laravel 8 and I want to display list of event by joining tables that have many to many relationship.
Here is how my tables look:
Users Table
| id | firstname | status |
|----|------------|--------|
| 1 | Amy | 0 |
| 2 | 2 amy | 0 |
| 3 | 3 amy | 1 |
| 4 | 4 amy | 0 |
| 5 | 5 amy | 1 |
| 6 | 6 amy | 1 |
Here is my pivot table
events_users Table
| id | event_id | user_id |
|----|------------|---------|
| 1 | 123 | 1 |
| 1 | 123 | 2 |
| 1 | 123 | 3 |
| 1 | 123 | 4 |
Here is my events table
events Table
| id | eventid | title |
|----|------------|---------|
| 1 | 123 | title |
| 1 | 124 | title 1 |
| 1 | 125 | title 2 |
| 1 | 126 | title 3 |
Here is my model fetching the results:
$events = DB::table('events')
->join('events_users', 'events.eventid', '=', 'events_users.event_id')
->join('users', 'users.id', '=', 'events_users.user_id')
->when($sortBy, function ($query, $sortBy) {
return $query->orderBy($sortBy);
}, function ($query) {
return $query->orderBy('events.created_at', 'desc');
})
->when($search_query, function ($query, $search_query) {
return $query->where('title', 'like', '%'. $search_query . '%');
})
->select(
'title', 'eventuid', 'description', 'start_date',
'end_date', 'start_time', 'end_time', 'status',
'venue', 'address_line_1', 'address_line_2', 'address_line_3',
'postcode', 'city', 'city_id', 'country', 'image',
'users.firstname', 'users.lastname', 'users.avatar'
)
->simplePaginate(15);
This results in duplicate entries:
Current Result:
{
"current_page": 1,
"data": [
{
"title": "Who in the newspapers, at the mushroom (she had.",
"eventuid": "be785bac-70d5-379f-a6f8-b35e66c8e494",
"description": "I'd been the whiting,' said Alice, 'and why it is I hate cats and dogs.' It was opened by another footman in livery came running out of sight before the trial's over!' thought Alice. 'I'm glad they.",
"start_date": "2000-11-17",
"end_date": "1988-02-24",
"start_time": "1972",
"end_time": "2062",
"status": 1,
"venue": "4379",
"address_line_1": "Kuhn Expressway",
"address_line_2": "2295 Kerluke Drive Suite 335",
"address_line_3": "Fredtown",
"postcode": "57094",
"city": "New Cassidyburgh",
"city_id": 530,
"country": "Cocos (Keeling) Islands",
"image": "https://via.placeholder.com/1280x720.png/00dd99?text=repellat",
"firstname": "Marielle",
"lastname": "Tremblay",
"avatar": "https://via.placeholder.com/640x480.png/002277?text=eum"
},
{
"title": "Who in the newspapers, at the mushroom (she had.",
"eventuid": "be785bac-70d5-379f-a6f8-b35e66c8e494",
"description": "I'd been the whiting,' said Alice, 'and why it is I hate cats and dogs.' It was opened by another footman in livery came running out of sight before the trial's over!' thought Alice. 'I'm glad they.",
"start_date": "2000-11-17",
"end_date": "1988-02-24",
"start_time": "1972",
"end_time": "2062",
"status": 1,
"venue": "4379",
"address_line_1": "Kuhn Expressway",
"address_line_2": "2295 Kerluke Drive Suite 335",
"address_line_3": "Fredtown",
"postcode": "57094",
"city": "New Cassidyburgh",
"city_id": 530,
"country": "Cocos (Keeling) Islands",
"image": "https://via.placeholder.com/1280x720.png/00dd99?text=repellat",
"firstname": "Floyd",
"lastname": "Waelchi",
"avatar": "https://via.placeholder.com/640x480.png/0033cc?text=inventore"
},
...
]
}
What I want to retrieve is something like this:
Expecting:
{
"current_page": 1,
"data": [
{
"title": "Who in the newspapers, at the mushroom (she had.",
"eventuid": "be785bac-70d5-379f-a6f8-b35e66c8e494",
"description": "I'd been the whiting,' said Alice, 'and why it is I hate cats and dogs.' It was opened by another footman in livery came running out of sight before the trial's over!' thought Alice. 'I'm glad they.",
"start_date": "2000-11-17",
"end_date": "1988-02-24",
"start_time": "1972",
"end_time": "2062",
"status": 1,
"venue": "4379",
"address_line_1": "Kuhn Expressway",
"address_line_2": "2295 Kerluke Drive Suite 335",
"address_line_3": "Fredtown",
"postcode": "57094",
"city": "New Cassidyburgh",
"city_id": 530,
"country": "Cocos (Keeling) Islands",
"image": "https://via.placeholder.com/1280x720.png/00dd99?text=repellat",
"users" : {[
{
"firstname": "Marielle",
"lastname": "Tremblay",
"avatar": "https://via.placeholder.com/640x480.png/002277?text=eum"
},
{
"firstname": "Amy",
"lastname": "Bond",
"avatar": "https://via.placeholder.com/640x480.png/005277?text=eum"
}
]}
},
...
]
}

jq - find the name of the array inside JSON object and then get the content of the array

I have the following JSON array
[
{
"city": "Seattle",
"array10": [
"1",
"2"
]
},
{
"city": "Seattle",
"array11": [
"3"
]
},
{
"city": "Chicago",
"array20": [
"1",
"2"
]
},
{
"city": "Denver",
"array30": [
"3"
]
},
{
"city": "Reno",
"array50": [
"1"
]
}
]
My task is the following: for each "city" values, which are known, get the names of arrays and for each array, get its contents printed/displayed. Names of cities and arrays are unique, the content of arrays - are not.
The result should look like the following:
Now working on Seattle
Seattle has the following arrays:
array10
array11
Content of the array10
1
2
Content of the array11
3
Now working on Chicago
Chicago has the following arrays:
array20
Content of the array array20
1
2
Now working on Denver
Denver has the following arrays:
array30
Content of the array array30
3
Now working on Reno
Denver has the following arrays:
array50
Content of the array array50
1
Now, for each city name (which are provided/known) I can find names of arrays using the following filter (I can put city names in the vars obviously):
jq -r .[] | select ( .name | test("Seattle") ) | del (.name) | keys |#tsv
Then assign these names to a bash variable and iterate in the new cycle to get the content of each array.
While I can get what I want with the above, my question - is there a more efficient way to do it with jq?
And the second, related question - if my JSON had the following structure below, would it make my task easier for the speed/efficiency/simplicity standpoint?
[
{
"name": "Seattle",
"content": {
"array10": [
"1",
"2"
],
"array11": [
"3"
]
}
},
{
"name": "Chicago",
"content": {
"array20": [
"1",
"2"
]
}
},
{
"name": "Denver",
"content": {
"array30": [
"3"
]
}
},
{
"name": "Reno",
"content": {
"array50": [
"1"
]
}
}
]
Using the -r command-line option, the following program produces the output as shown below:
group_by(.city)[]
| .[0].city as $city
| map(keys_unsorted[] | select(test("^array"))) as $arrays
| "Now working on \($city)",
"\($city) has the following arrays:",
$arrays[],
(.[] | to_entries[] | select(.key | test("^array"))
| "Content of the \(.key)", .value[])
Output
Now working on Chicago
Chicago has the following arrays:
array20
Content of the array20
1
2
Now working on Denver
Denver has the following arrays:
array30
Content of the array30
3
Now working on Reno
Reno has the following arrays:
array50
Content of the array50
1
Now working on Seattle
Seattle has the following arrays:
array10
array11
Content of the array10
1
2
Content of the array11
3

How can i access the elements from multi dimensional json array (in SQL Server)

I have a multidimensional JSON array, I am accessing the JSON array in SQL Server and using 'OPENJSON' to convert JSON data to SQL. I am currently facing problem in fetching the data from multidimensional array
Declare #Json nvarchar(max)
Set #Json= '[{
"id": 0,
"healthandSafety": "true",
"estimationCost": "7878",
"comments": "\"Comments\"",
"image": [{
"imageData": "1"
}, {
"imageData": "2"
}, {
"imageData": "3"
}, {
"imageData": "4"
}, {
"imageData": "5"
}]
}, {
"id": 1,
"healthandSafety": "false",
"estimationCost": "90",
"comments": "\"89089\"",
"image": [{
"imageData": "6"
}, {
"imageData": "7"
}, {
"imageData": "8"
}, {
"imageData": "9"
}, {
"imageData": "10"
}, {
"imageData": "11"
}]
}]'
Select ImageJsonFile from OPENJSON (#Json) with (ImageJsonFile nvarchar(max) '$.image[0].imageData')
When I tried the above code I obtained the following output:
ImageJsonFile
1
6
The output what I am expecting :
ImageJsonFile
1
2
3
4
5
You need to define query path:
Select * from OPENJSON (#Json,'$[0].image') with (ImageJsonFile nvarchar(max) '$.imageData')
You've got an answer already, so this is just to add some more details:
The following will bring back all data from your multi dimensional array, not just one array index you'd have to specify explictly.
DECLARE #Json NVARCHAR(MAX)=
N'[{
"id": 0,
"healthandSafety": "true",
"estimationCost": "7878",
"comments": "\"Comments\"",
"image": [{
"imageData": "1"
}, {
"imageData": "2"
}, {
"imageData": "3"
}, {
"imageData": "4"
}, {
"imageData": "5"
}]
}, {
"id": 1,
"healthandSafety": "false",
"estimationCost": "90",
"comments": "\"89089\"",
"image": [{
"imageData": "6"
}, {
"imageData": "7"
}, {
"imageData": "8"
}, {
"imageData": "9"
}, {
"imageData": "10"
}, {
"imageData": "11"
}]
}]';
--The query
SELECT A.id
,A.healthandSafety
,A.estimationCost
,A.comments
,B.imageData
FROM OPENJSON(#Json)
WITH(id INT
,healthandSafety BIT
,estimationCost INT
,comments NVARCHAR(1000)
,[image] NVARCHAR(MAX) AS JSON ) A
CROSS APPLY OPENJSON(A.[image])
WITH(imageData INT) B;
The result
+----+-----------------+----------------+----------+-----------+
| id | healthandSafety | estimationCost | comments | imageData |
+----+-----------------+----------------+----------+-----------+
| 0 | 1 | 7878 | Comments | 1 |
+----+-----------------+----------------+----------+-----------+
| 0 | 1 | 7878 | Comments | 2 |
+----+-----------------+----------------+----------+-----------+
| 0 | 1 | 7878 | Comments | 3 |
+----+-----------------+----------------+----------+-----------+
| 0 | 1 | 7878 | Comments | 4 |
+----+-----------------+----------------+----------+-----------+
| 0 | 1 | 7878 | Comments | 5 |
+----+-----------------+----------------+----------+-----------+
| 1 | 0 | 90 | 89089 | 6 |
+----+-----------------+----------------+----------+-----------+
| 1 | 0 | 90 | 89089 | 7 |
+----+-----------------+----------------+----------+-----------+
| 1 | 0 | 90 | 89089 | 8 |
+----+-----------------+----------------+----------+-----------+
| 1 | 0 | 90 | 89089 | 9 |
+----+-----------------+----------------+----------+-----------+
| 1 | 0 | 90 | 89089 | 10 |
+----+-----------------+----------------+----------+-----------+
| 1 | 0 | 90 | 89089 | 11 |
+----+-----------------+----------------+----------+-----------+
The idea in short:
We use the first OPENJSON to get the elements of the first level. The WITH clause will name all elements and return the [image] with NVARCHAR(MAX) AS JSON. This allows to use another OPENJSON to read the numbers from imageData, your nested dimension, while the id-column is the grouping key.

PostgreSQL json and array processing

I need to output json out from the query.
Input data:
Documents:
==========
id | name | team
------------------
1 | doc1 | {"authors": [1, 2, 3], "editors": [3, 4, 5]}
Persons:
========
id | name |
--------------
1 | Person1 |
2 | Person2 |
3 | Person3 |
4 | Person4 |
5 | Person5 |
Query:
select d.id, d.name,
(select jsonb_build_object(composed)
from
(
select teamGrp.key,
(
select json_build_array(persAgg) from
(
select
(
select jsonb_agg(pers) from
(
select person.id, person.name
from
persons
where (persList.value)::int=person.id
) pers
)
from
json_array_elements_text(teamGrp.value::json) persList
) persAgg
)
from
jsonb_each_text(d.team) teamGrp
) teamed
) as teams
from
documents d;
and i expect the following output:
{"id": 1, "name": "doc1", "teams":
{"authors": [{"id": 1, "name": "Person1"}, {"id": 2, "name": "Person2"}, {"id": 3, "name": "Person3"}],
"editors": [{"id": 3, "name": "Person3"}, {"id": 5, "name": "Person5"}, {"id": 5, "name": "Person5"}]}
But received an error:
ERROR: more than one row returned by a subquery used as an expression
Where is the problem and how to fix it?
PostgreSQL 9.5
I think the following (super complicated query) should to it:
SELECT
json_build_object(
'id',id,
'name',name,
'teams',(
SELECT json_object_agg(team_name,
(SELECT
json_agg(json_build_object('id',value,'name',Persons.name))
FROM json_array_elements(team_members)
INNER JOIN Persons ON (value#>>'{}')::integer=Persons.id
)
)
FROM json_each(team) t(team_name,team_members)
)
)
FROM Documents;
I am using subqueries where I run json aggregates.

Database design from a json file

I have a json file like this
[
{
"topic": "Example1",
"ref": {
"1": "Example Topic",
"2": "Topic"
},
"contact": [
{
"ref": [
1
],
"corresponding": true,
"name": "XYZ"
},
{
"ref": [
1
],
"name": "ZXY"
},
{
"ref": [
1
],
"name": "ABC"
},
{
"ref": [
1,
2
],
"name":"BCA"
}
] ,
"type": "Presentation"
},
{
"topic": "Example2",
"ref": {
"1": "Example Topic",
"2": "Topic"
},
"contact": [
{
"ref": [
1
],
"corresponding": true,
"name": "XYZ"
},
{
"ref": [
1
],
"name": "ZXY"
},
{
"ref": [
1
],
"name": "ABC"
},
{
"ref": [
1,
2
],
"name":"BCA"
}
] ,
"type": "Poster"
}
]
I created 3 TablesItems,Reference,Contact one is
Items:
Item_ID
topic
type
reference:
ref_ID
content
Contact:
ref_ID
contact_ID
Item_ID
name
RelationShip :
1) Items has many references
2)Items has many Authors
3)Authors has many references
Now, my question is
1) Should I doing any wrong here?
2) is there any way to improve the my current implementation ?
3) Here I am confused about to implement the corresponding(inside the contact Array). How do I implement that in design ?
Thanks.
From your above Json., what I could infer is this normalized schema. You have 2 ref in your above Json. Could you clarify it?
Also, here a useful link for you., http://jsonviewer.stack.hu/ Switch between viewer and Text tabs.
The actual example from your scenario is.,
P- Primary Key
Ref - Reference Key
Topic:
--------------------------------------------------
Topic ID (P) | TopicName | TypeID (Ref)
----------------------------------------------------
0 Example1 0
1 Example2 1
TopicReferences :
----------------------------
TopicID (P) | RefernceID (Ref)
--------------------------------
0 0
0 1
1 0
1 1
Reference :
------------------------------------
ReferenceID (P) | ReferenceName
------------------------------------
0 Example Topic
1 Topic
Presentation Type :
--------------------------
TypeID (P) | TypeName
--------------------------
0 Presentation
1 Poster
TopicContacts:
---------------------------------
TopicID | ContactID (Ref)
---------------------------------
0 0
0 1
0 2
0 3
1 0
1 1
1 2
1 3
Contact:
-------------------------------------------------------------------
ContactID(P) | ContactName | IsCorresponding ( Boolean, nullable)
------------------------------------------------------------------
0 XYZ YES
1 ZXY NULL
2 ABC NULL
3 BCA NULL
ContactsReference2:
--------------------------------------------
ContactID | Reference2ID (Ref)
--------------------------------------------
0 0
1 0
2 0
3 0
3 1
Reference2:
--------------------------------------------
Reference2ID(P) | Reference2Value (NUM)
--------------------------------------------
0 1
1 2

Resources