Query for condition in array of JSON objects in PostgreSQL - arrays

Lets assume we have a PostgreSQL db with a table with rows of the following kind:
id | doc
---+-----------------
1 | JSON Object
2 | JSON Object
3 | JSON Object
...
The JSON has the following structure:
{
'header' : {
'info' : 'foo'},
'data' :
[{'a' : 1, 'b' : 123},
{'a' : 2, 'b' : 234},
{'a' : 1, 'b' : 543},
...
{'a' : 1, 'b' : 123},
{'a' : 4, 'b' : 452}]
}
with arbitrary values for 'a' and 'b' in 'data' in all rows of the table.
First question: how do I query for rows in the table where the following condition holds:
There exists a dictionary in the list/array with the key 'data', where a==i and b>j.
For example for i=1 and j=400 the condition would be fulfilled for the example above and the respective column would be returned.
Second question:
In my problem I have to deal with time series data in Json. Every measurement is represented by one Json and therefore one row in the table. I want to identify measurements where certain events occurred. For the case that the above structure is unsuitable in terms of easy querying: How could such a time series look like to be more easily queryable?
Thanks a lot!

I believe a query like this should answer your first question:
select distinct id, doc
from (
select id, doc, jsonb_array_elements(doc->'data') as elem
from docs
) as docelem
where (elem->>'a')::int = 4 and (elem->>'b')::int > 400
db<>fiddle here

Related

Count nested JSON array elements over all result rows

I have a SQL query that I am running in order to get results, where one of the column contains a JSON array.
I want to count the total of JSON elements in total from all returned rows.
I.e. if 2 rows were returned, where one row had 3 JSON array items in metadata column, and the second row had 4 JSON array items in metadata column, I'd like to see 7 as a returned count.
Is this possible?
This is my current SQL query:
WITH _result AS (
SELECT lo.*
FROM laser.laser_checks la
JOIN laser.laser_brands lo ON la.id = lo.brand_id
WHERE lo.type not in (1)
AND la.source in (1,4,5)
AND la.prod_id in (1, 17, 19, 22, 27, 29)
)
SELECT ovr.json -> 'id' AS object_uuid,
ovr.json -> 'username' AS username,
image.KEY AS image_uuid,
image.value AS metadata,
user_id as user_uuid
FROM _result ovr,
jsonb_array_elements(ovr."json" -> 'images') elem,
jsonb_each(elem) image
Unpack the arrays and count the elements:
WITH q AS (/* your query */)
SELECT object_uuid,
username,
image_uuid,
metadata,
user_uuid,
sum(elemcount) OVER () AS total_array_elements
FROM (SELECT q.object_uuid,
q.username,
q.image_uuid,
q.metadata,
q.user_uuid,
count(a.e) AS elemcount
FROM q
LEFT JOIN LATERAL jsonb_array_elements(q.metadata) AS a(e)
ON TRUE
GROUP BY q.object_uuid,
q.username,
q.image_uuid,
q.metadata,
q.user_uuid
) AS p;
An elephant managed to slip everybody's attention in this room: jsonb_array_length().
(Or json_array_length() for json.)
The manual:
Returns the number of elements in the top-level JSON array.
After you have already unnested the JSON array to your level of interest, you can apply the function to the (now) top level. Wrap it in a window function to get total counts for every result row.
Your query should work like this:
SELECT lo.json -> 'id' AS object_uuid
, lo.json -> 'username' AS username
, image.key AS image_uuid
, image.value AS metadata
, lo.user_id AS user_uuid
, sum(jsonb_array_length(image.value)) OVER () AS total_array_elements -- !!!
FROM laser.laser_checks la
JOIN laser.laser_brands lo ON la.id = lo.brand_id
, jsonb_array_elements(lo."json" -> 'images') elem
, jsonb_each(elem) image
WHERE lo.type NOT IN (1)
AND la.source IN (1,4,5)
AND la.prod_id IN (1, 17, 19, 22, 27, 29);
No need for a LATERAL subquery, aggregation, nor even for a CTE, really.
Related:
Sort by length of nested JSON array

Return Parts of an Array in Postgres

I have a column (text) in my Postgres DB (v.10) with a JSON format.
As far as i now it's has an array format.
Here is an fiddle example: Fiddle
If table1 = persons and change_type = create then i only want to return the name and firstname concatenated as one field and clear the rest of the text.
Output should be like this:
id table1 did execution_date change_type attr context_data
1 Persons 1 2021-01-01 Create Name [["+","name","Leon Bill"]]
1 Persons 2 2021-01-01 Update Firt_name [["+","cur_nr","12345"],["+","art_cd","1"],["+","name","Leon"],["+","versand_art",null],["+","email",null],["+","firstname","Bill"],["+","code_cd",null]]
1 Users 3 2021-01-01 Create Street [["+","cur_nr","12345"],["+","art_cd","1"],["+","name","Leon"],["+","versand_art",null],["+","email",null],["+","firstname","Bill"],["+","code_cd",null]]
Disassemble json array into SETOF using json_array_elements function, then assemble it back into structure you want.
select m.*
, case
when m.table1 = 'Persons' and m.change_type = 'Create'
then (
select '[["+","name",' || to_json(string_agg(a.value->>2,' ' order by a.value->>1 desc))::text || ']]'
from json_array_elements(m.context_data::json) a
where a.value->>1 in ('name','firstname')
)
else m.context_data
end as context_data
from mutations m
modified fiddle
(Note:
utilization of alphabetical ordering of names of required fields is little bit dirty, explicit order by case could improve readability
resulting json is assembled from string literals as much as possible since you didn't specified if "+" should be taken from any of original array elements
the to_json()::text is just for safety against injection
)

Snowflake Retrieve value from Semi Structured Data

I'm trying to retrieve the health value from Snowflake semi structured data in a variant column called extra from table X.
An example of the code can be seen below:
[
{
"party":
"[{\"class\":\"Farmer\",\"gender\":\"Female\",\"ethnicity\":\"NativeAmerican\",\"health\":2},
{\"class\":\"Adventurer\",\"gender\":\"Male\",\"ethnicity\":\"White\",\"health\":3},
{\"class\":\"Farmer\",\"gender\":\"Male\",\"ethnicity\":\"White\",\"health\":0},
{\"class\":\"Banker\",\"gender\":\"Female\",\"ethnicity\":\"White\",\"health\":0}
}
]
I have tried reading the Snowflake documentation from https://community.snowflake.com/s/article/querying-semi-structured-data
I have also tried the following queries to flatten the query:
SELECT result.value:health AS PartyHealth
FROM X
WHERE value = 'Trail'
AND name = 'Completed'
AND PartyHealth > 0,
TABLE(FLATTEN(X, 'party')) result
AND
SELECT [0]['party'][0]['health'] AS Health
FROM X
WHERE value = 'Trail'
AND name = 'Completed'
AND PH > 0;
I am trying to retrieve the health value from table X from column extra which contains the the variant party, which has 4 repeating values [0-3]. Im not sure how to do this is someone able to tell me how to query semi structured data in Snowflake, considering the documentation doesn't make much sense?
First, the JSON value you posted seems wrong formatted (might be a copy paste issue).
Here's an example that works:
first your JSON formatted:
[{ "party": [ {"class":"Farmer","gender":"Female","ethnicity":"NativeAmerican","health":2}, {"class":"Adventurer","gender":"Male","ethnicity":"White","health":3}, {"class":"Farmer","gender":"Male","ethnicity":"White","health":0}, {"class":"Banker","gender":"Female","ethnicity":"White","health":0} ] }]
create a table to test:
CREATE OR REPLACE TABLE myvariant (v variant);
insert the JSON value into this table:
INSERT INTO myvariant SELECT PARSE_JSON('[{ "party": [ {"class":"Farmer","gender":"Female","ethnicity":"NativeAmerican","health":2}, {"class":"Adventurer","gender":"Male","ethnicity":"White","health":3}, {"class":"Farmer","gender":"Male","ethnicity":"White","health":0}, {"class":"Banker","gender":"Female","ethnicity":"White","health":0} ] }]');
now, to select a value you start from column name, in my case v, and as your JSON is an array inside, I specify first value [0], and from there expand, so something like this:
SELECT v[0]:party[0].health FROM myvariant;
Above gives me:
For the other rows you can simply do:
SELECT v[0]:party[1].health FROM myvariant;
SELECT v[0]:party[2].health FROM myvariant;
SELECT v[0]:party[3].health FROM myvariant;
Another option might be to make the data more like a table ... I find it easier to work with than the JSON :-)
Code at bottom - just copy/paste and it runs in Snowflake returning screenshot below.
Key Doco is Lateral Flatten
SELECT d4.path, d4.value
from
lateral flatten(input=>PARSE_JSON('[{ "party": [ {"class":"Farmer","gender":"Female","ethnicity":"NativeAmerican","health":2}, {"class":"Adventurer","gender":"Male","ethnicity":"White","health":3}, {"class":"Farmer","gender":"Male","ethnicity":"White","health":0}, {"class":"Banker","gender":"Female","ethnicity":"White","health":0} ] }]') ) as d ,
lateral flatten(input=> value) as d2 ,
lateral flatten(input=> d2.value) as d3 ,
lateral flatten(input=> d3.value) as d4

Filter Condition in Business Object BO

I have a problem with a filter condition in BO.
Imagine that I have this database
ID | DESC
0 | None
1 | Company
2 | All
In BO I have a filter that ask where do you want to find the objects and 2 options:
"Company" or "All".
If I choose "All" then I should have all the datas with the "ID" 0,1,2 and if I choose "Company" only the data with the "ID" 1.
So I did something like this:
TABLE_NAME.ID <= (CASE WHEN #Prompt('where do you want to find the objects','A',{'Company', 'All'},mono,constrained,not_persistent,{'Company'}) = 'Company' THEN 1 ELSE 2 END)
This filter is OK when I choose "All" because I have all the "ID" smaller than 2, i.e, 0,1,2.
But It does not work when my option is company, because it also shows the data with the "ID" 0.
I should have some with "=" combined with "<="
If it's really only that simple, the following will work:
TABLE_NAME.ID =
(CASE #Prompt('where do you want to find the objects',
'A',
{'Company', 'All'},
mono,
constrained,
not_persistent,{'Company'}
)
WHEN 'Company'
THEN 1
WHEN 'All'
THEN TABLE_NAME.ID
END)

How to delete array element in JSONB column based on nested key value?

How can I remove an object from an array, based on the value of one of the object's keys?
The array is nested within a parent object.
Here's a sample structure:
{
"foo1": [ { "bar1": 123, "bar2": 456 }, { "bar1": 789, "bar2": 42 } ],
"foo2": [ "some other stuff" ]
}
Can I remove an array element based on the value of bar1?
I can query based on the bar1 value using: columnname #> '{ "foo1": [ { "bar1": 123 } ]}', but I've had no luck finding a way to remove { "bar1": 123, "bar2": 456 } from foo1 while keeping everything else intact.
Thanks
Running PostgreSQL 9.6
Assuming that you want to search for a specific object with an inner object of a certain value, and that this specific object can appear anywhere in the array, you need to unpack the document and each of the arrays, test the inner sub-documents for containment and delete as appropriate, then re-assemble the array and the JSON document (untested):
SELECT id, jsonb_build_object(key, jarray)
FROM (
SELECT foo.id, foo.key, jsonb_build_array(bar.value) AS jarray
FROM ( SELECT id, key, value
FROM my_table, jsonb_each(jdoc) ) foo,
jsonb_array_elements(foo.value) AS bar (value)
WHERE NOT bar.value #> '{"bar1": 123}'::jsonb
GROUP BY 1, 2 ) x
GROUP BY 1;
Now, this may seem a little dense, so picked apart you get:
SELECT id, key, value
FROM my_table, jsonb_each(jdoc)
This uses a lateral join on your table to take the JSON document jdoc and turn it into a set of rows foo(id, key, value) where the value contains the array. The id is the primary key of your table.
Then we get:
SELECT foo.id, foo.key, jsonb_build_array(bar.value) AS jarray
FROM foo, -- abbreviated from above
jsonb_array_elements(foo.value) AS bar (value)
WHERE NOT bar.value #> '{"bar1": 123}'::jsonb
GROUP BY 1, 2
This uses another lateral join to unpack the arrays into bar(value) rows. These objects can now be searched with the containment operator to remove the objects from the result set: WHERE NOT bar.value #> '{"bar1": 123}'::jsonb. In the select list the arrays are re-assembled by id and key but now without the offending sub-documents.
Finally, in the main query the JSON documents are re-assembled:
SELECT id, jsonb_build_object(key, jarray)
FROM x -- from above
GROUP BY 1;
The important thing to understand is that PostgreSQL JSON functions only operate on the level of the JSON document that you can explicitly indicate. Usually that is the top level of the document, unless you have an explicit path to some level in the document (like {foo1, 0, bar1}, but you don't have that). At that level of operation you can then unpack to do your processing such as removing objects.

Resources