SQL Searching Query inside JSON object array - arrays

Consider the below JSON object
[
{
"startdt": "10/13/2021",
"enddt": "10/13/2022",
"customerName1": "John",
"customerName2": "CA"
},
{
"startdt": "10/14/2021",
"enddt": "10/14/2022",
"customerName1": "Jacob",
"customerName2": "NJ"
}
]
This is the value present in a table "CustInfo" in the column "custjson" in Postgress DB. I want to search the data for the field customerName1. I have created the below query but it is searching in the whole object in such a way that if I give customerName1 as "Jacob" it gives the whole array. I want to search only for a particular array and return the same.
SELECT DISTINCT ON(e.id) e.*,
(jsonb_array_elements(e.custjson)->>'customerName1') AS name1
FROM CustInfo e
CROSS JOIN jsonb_array_elements(e.custjson) ej
WHERE value ->> 'customerName1' LIKE '%Jacob%'
Is there a way in which we can only search the "Jacob" customerName1's array instead of whole json?
For eg: if i search for Jacob i should get the following istead of searching the whole JSON
{
"startdt": "10/14/2021",
"enddt": "10/14/2022",
"customerName1": "Jacob",
"customerName2": "NJ"
}
Any help would be greatly helpful

You can use a JSON path expression to find the array element with a matching customer name:
select e.id,
jsonb_path_query_array(e.custjson, '$[*] ? (#.customerName1 like_regex "Jacob")')
from custinfo e
Based on your sample data, this returns:
id | jsonb_path_query_array
---+----------------------------------------------------------------------------------------------------
1 | [{"enddt": "10/14/2022", "startdt": "10/14/2021", "customerName1": "Jacob", "customerName2": "NJ"}]
If you are using an older Postgres version, that doesn't support JSON path queries, you need to unnest and aggregate manually:
select e.id,
(select jsonb_agg(element)
from jsonb_array_elements(e.custjson) as x(element)
where x.element ->> 'customerName1' like '%Jacob%')
from custinfo e
This assumes that custjson is defined with the data type jsonb (which it should be). If not, you need to cast it: custjson::jsonb

Related

Is it possible to parse json by using select statement in Netezza?

I have json data in one of the column of my table and I would like to parse json data by using select statement in Netezza. I am not able to figure it out.
Can you all help me to solve this problem?
Let's say I have TableA and this table has column Customer_detail. data from customer_detail field lookss like this
'{"Customer":[{"id":"1","name":"mike","address":"NYC"}]}'
Now I would like to query id from customer object of customer_detail column.
Thanks in advance.
From NPS 11.1.0.0 onwards, you can parse and use json datatype itself in NPS.
Here's an example
SYSTEM.ADMIN(ADMIN)=> create table jtest(c1 jsonb);
CREATE TABLE
SYSTEM.ADMIN(ADMIN)=> insert into jtest values('{"name": "Joe Smith", "age": 28, "sports": ["football", "volleyball", "soccer"]}');
INSERT 0 1
SYSTEM.ADMIN(ADMIN)=> insert into jtest values('{"name": "Jane Smith", "age": 38, "sports": ["volleyball", "soccer"]}');
INSERT 0 1
SYSTEM.ADMIN(ADMIN)=> select * from jtest;
C1
----------------------------------------------------------------------------------
{"age": 28, "name": "Joe Smith", "sports": ["football", "volleyball", "soccer"]}
{"age": 38, "name": "Jane Smith", "sports": ["volleyball", "soccer"]}
(2 rows)
SYSTEM.ADMIN(ADMIN)=> select c1 -> 'name' from jtest where c1 -> 'age' > 20::jsonb ;
?COLUMN?
--------------
"Joe Smith"
"Jane Smith"
(2 rows)
You can refer to https://www.ibm.com/support/knowledgecenter/SSTNZ3/com.ibm.ips.doc/postgresql/dbuser/r_dbuser_functions_expressions.html for more details as well.
Looking at the comment you put above, something like
select customer_detail::json -> 'Customer' -> 0 -> 'id' as id,
customer_detail::json -> 'Customer' -> 0 -> 'name' as name
from ...
This will parse the text to json during every execution. A more performant would be to convert customer_detail to jsonb datatype
If the NPS version is below 11.1.x then the json handling needs to be done (a) externally as in using sql to get the json data and then processing it outside the database or (b) using UDF - creating a UDF that supports json parsing
E.g -
Using the programming language of choice, process the json external to SQL
import nzpy # install using "python3 -m pip install nzpy"
import os
import json
# assume NZ_USER, NZ_PASSWORD, NZ_DATABASE and NZ_HOST are set
con = nzpy.connect(user=os.environ["NZ_USER"],
password=os.environ["NZ_PASSWORD"], host=os.environ["NZ_HOST"],
database=os.environ["NZ_DATABASE"], port=5480)
with con.cursor() as cur:
cur.execute('select customer_detail from ...')
for customer_detail in cur.fetch_all():
c = json.loads(customer_detail)
print((c['Customer'][0]['name'], c['Customer'][0]['id']))
Or create a UDF that parses json and use that in the SQL query
If none of those are options, and the json is always well formatted (ie. no new lines, only one key called "id" and one key called "name", etc) then a regex may be a way around, though its not recommended since its not a real json parser
select regexp_extract(customer_detail,
'"id"[[:space:]]*:[[:space:]]*"([^"]+)"', 1, 1) as id,
regexp_extract(customer_detail,
'"name"[[:space:]]*:[[:space:]]*"([^"]+)"', 1, 1) as name
....

Snowflake - extract JSON array string object values into pipe separated values

I have a nested JSON array which is a string object which has been stored into variant type stage table and I want to extract particular string object value and populate with pipe separated values if more than one object found. Can someone help me to achieve the desired output format please.
Sample JSON data
{"issues": [
{
"expand": "",
"fields": {
"customfield_10010": [
"com.atlassian.xxx.yyy.yyyy.Sprint#xyz456[completeDate=2020-07-20T20:19:06.163Z,endDate=2020-07-17T21:48:00.000Z,goal=,id=1234,name=SPR-SPR 8,rapidViewId=239,sequence=1234,startDate=2020-06-27T21:48:00.000Z,state=CLOSED]",
"com.atlassian.xxx.yyy.yyyy.Sprint#abc123[completeDate=<null>,endDate=2020-08-07T20:33:00.000Z,goal=,id=1239,name=SPR-SPR 9,rapidViewId=239,sequence=1239,startDate=2020-07-20T20:33:26.364Z,state=ACTIVE]"
],
"customfield_10011": "obcd",
"customfield_10024": null,
"customfield_10034": null,
"customfield_10035": null,
"customfield_10037": null,
},
"id": "123456",
"key": "SUE-1234",
"self": "xyz"
}]}
I don't have any idea on how to separate the string objects inside an array with snowflake.
By using the below query I can get whole string converted into pipe separated values.
select
a.value:id::number as ISSUE_ID,
a.value:key::varchar as ISSUE_KEY,
array_to_string(a.value:fields.customfield_10010, '|') as CF_10010_Data
from
ABC.VARIANT_TABLE,
lateral flatten( input => payload_json:issues) as a;
But I need to extract particular string object value. Say for example id value such as 1234 & 1239 to be populated as pipe separated as shown below.
ISSUE_ID ISSUE_KEY SPRINT_ID
123456 SUE-1234 1234|1239
Any idea on this to get desired result is much appreciated. Thanks..
It looks like the data within [...] for your sprints are just details about that sprint. I think it would be easiest for you to actually populate a separate sprints table with data on each sprint, and then you can join that table to the Sprint ID values parsed from the API response you showed with issues data.
with
jira_responses as (
select
$1 as id,
$2 as body
from (values
(1, '{"issues":[{"expand":"","fields":{"customfield_10010":["com.atlassian.xxx.yyy.yyyy.Sprint#xyz456[completeDate=2020-07-20T20:19:06.163Z,endDate=2020-07-17T21:48:00.000Z,goal=,id=1234,name=SPR-SPR 8,rapidViewId=239,sequence=1234,startDate=2020-06-27T21:48:00.000Z,state=CLOSED]","com.atlassian.xxx.yyy.yyyy.Sprint#abc123[completeDate=<null>,endDate=2020-08-07T20:33:00.000Z,goal=,id=1239,name=SPR-SPR 9,rapidViewId=239,sequence=1239,startDate=2020-07-20T20:33:26.364Z,state=ACTIVE]"],"customfield_10011":"obcd","customfield_10024":null,"customfield_10034":null,"customfield_10035":null,"customfield_10037":null},"id":"123456","key":"SUE-1234","self":"xyz"}]}')
)
)
select
issues.value:id::integer as issue_id,
issues.value:key::string as issue_key,
get(split(sprints.value::string, '['), 0)::string as sprint_id
from jira_responses,
lateral flatten(input => parse_json(body):issues) issues,
lateral flatten(input => parse_json(issues.value):fields:customfield_10010) sprints
Based on your sample data, the results would look like the following.
See Snowflake reference docs below.
"Querying Semi-structured Data"
PARSE_JSON
FLATTEN
SPLIT
GET

Create a Nested/Repeating field using SQL in BigQuery which can be queried with dot notation (without UNNEST)

I am trying to build a data structure in BigQuery using SQL which exactly reflects the data structure which I obtain when uploading JSON. This will enable me to query the view using SQL with dot notation instead of having to UNNEST, which I do understand but many of my clients find extremely confusing and unintuitive.
If I build a really simple dummy dataset with a couple of rows and then nest using the ARRAY_AGG(STRUCT([field list])) pattern:
WITH
flat_table AS (
SELECT "BigQuery" AS name, 23 AS user_count, "Data Warehouse" AS data_thing, 5 AS ease_of_use, "Awesome" AS description UNION ALL
SELECT "MySQL" AS name, 12 AS user_count, "Database" AS data_thing, 3 AS ease_of_use, "Solid" AS description
)
SELECT
name, user_count,
ARRAY_AGG(STRUCT(data_thing, ease_of_use, description)) AS attributes
FROM flat_table
GROUP BY name, user_count
Then saving and viewing the schema shows that the attributes field is Type = RECORD and Mode = REPEATED. Schema field names are:
name
user_count
attributes
attributes.data_thing
attributes.ease_of_use
attributes.description
If I look at the COLUMN information in the INFORMATION_SCHEMA.COLUMNS query I can see that the attributes field is_nullable = NO and data_type = ARRAY<STRUCT<data_thing STRING, ease_of_use INT64, description STRING>>
If I want to query this structure I need to use the UNNEST pattern as below:
SELECT
name,
user_count
FROM
nested_table,
UNNEST(attributes)
WHERE
ease_of_use > 3
However when I upload the following JSON representation of the same data to BigQuery with automatic schema detection:
{"attributes":{"description":"Awesome","ease_of_use":5,"data_thing":"Data Warehouse"},"user_count":23,"name":"BigQuery"}
{"attributes":{"description":"Solid","ease_of_use":3,"data_thing":"Database"},"user_count":12,"name":"MySQL"}
The schema looks nearly identical once loaded, except for the attributes field is Mode = NULLABLE (it is still Type = RECORD). The INFORMATION_SCHEMA.COLUMNS shows me that the attributes field is now is_nullable = YES and data_type = STRUCT<data_thing STRING, ease_of_use INT64, description STRING>, i.e. now nullable and not in an array.
However the most interesting thing for me is that I can now query this table using dot notation instead of the UNNEST pattern, so the query above becomes:
SELECT
name,
user_count
FROM
nested_table_json
WHERE
attributes.ease_of_use > 3
Which is arguably easier to read, even in this trivial case. However once we get to more complex data structures with multiple nested fields and multi-level nesting, the UNNEST pattern becomes extremely difficult to write, QA and debug. The dot notation pattern appears to be much more intuitive and scalable.
So my question is: is it possible to build a data structure equivalent to the loaded JSON by writing queries in SQL, enabling us to build Standard SQL queries using dot notation and not requiring complex UNNEST patterns?
If you know that your array_agg will produce one row, you can drop the ARRAY notation like this:
SELECT
name, user_count,
ARRAY_AGG(STRUCT(data_thing, ease_of_use, description))[offset(0)] AS attributes
notice the use of OFFSET(0) this way the returned output will be:
[
{
"name": "BigQuery",
"user_count": "23",
"attributes": {
"data_thing": "Data Warehouse",
"ease_of_use": "5",
"description": "Awesome"
}
}
]
which can be queried using dot notation.
In case you want just to group result in STRUCT, you don't need array_agg.
WITH
flat_table AS (
SELECT "BigQuery" AS name, 23 AS user_count, struct("Data Warehouse" AS data_thing, 5 AS ease_of_use, "Awesome" AS description) as attributes UNION ALL
SELECT "MySQL" AS name, 12 AS user_count, struct("Database" AS data_thing, 3 AS ease_of_use, "Solid" AS description)
)
SELECT
*
FROM flat_table

How to extract data from specific fields in a NESTED JSON using AWS Athena - Presto?

I have JSONs in the below format in a S3 bucket and I'm trying to extract only the "id", "label" & "value" from the "fields" key using Athena. I tried ARRAY-MAP but wasn't successful. Also, on the "value" field - I want the content to be captured as a simple text ignoring any list / dictionaries in it.
I also don't want to create any Hive schema for these JSONs and looking for a Presto SQL solution if possible.
{
"reports":{
"client":{
"pdf":"https://reports.s3-accelerate.amazonaws.com/looks/123/reports/client.pdf",
"html":"https://api.com/looks/123/reports/client.html"
},
"public":{
"pdf":"https://s3.amazonaws.com/reports.com/looks/123/reports/public.pdf",
"html":"https://api.look.com/looks/123/reports/public.html"
}
},
"actors":{
"looker":{
"firstName":"Rosa",
"lastName":"Mart"
},
"client":{
"email":"XXX.XXX#XXXXXX.com",
"firstName":"XXX",
"lastName":"XXX"
}
},
"_id":"123",
"fields":[
{
"id":"fence_condition_missing_sections",
"context":[
"Fence Condition"
],
"label":"Missing Sections",
"type":"choice",
"value":"None"
},
{
"id":"photos_landscaped_area",
"context":[
"Landscaping Photos"
],
"label":"Landscaped Area",
"type":"photo-with-description",
"value":[
{
"description":"Front",
"photo":"https://reports-wegolook-com.s3-accelerate.amazonaws.com/looks/123/looker/1.jpg"
},
{
"description":"Front entrance ",
"photo":"https://reports-wegolook-com.s3-accelerate.amazonaws.com/looks/123/looker/2.jpg"
}
]
}
],
"jobNumber":"xxx",
"createdAt":"2018-10-11T22:39:37.223Z",
"completedAt":"2018-01-27T20:13:49.937Z",
"inspectedAt":"2018-01-21T23:33:48.718Z",
"type":"ZZZ-commercial",
"name":"Commercial"
}'
expected output:
--------------------------------------------------------------------------------
| ID | LABEL | VALUE |
--------------------------------------------------------------------------------
| photos_landscaped_area | Landscaped Area | [{"description":"Front",...}] |
----------------------------------------------------------------------------
| fence_condition_missing_sections | Missing Sections | None|
----------------------------------------------------------------------------
I'm going to assume your data is in a one-document-per-line format and that you provided a formatted example for readability's sake. If this is incorrect, please see the question Multi-line JSON file querying in hive
.
When the schema of a JSON document is not entirely regular you can create that column as a string column and use the JSON_* functions to extract values out of it.
First you need to create a table for the raw data:
CREATE TABLE data (
fields array<struct<id:string,label:string,value:string>>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3://…'
(if you're not interested in the other fields in the JSON documents you can just ignore those when creating the table)
Then you create a view that flattens the data:
CREATE VIEW flat_data AS
SELECT
field.id,
field.label,
field.value
FROM data
CROSS JOIN UNNEST(fields) AS f(field)
Selecting from this view should give you the results you are looking for.
I suspect you are also looking for how to extract properties from the values structure, which is what I alluded to above:
SELECT
label,
JSON_EXTRACT(value, '$.photo') AS photo_urls
FROM flat_data
WHERE id = 'photos_landscaped_area'
Look in the Presto documentation for all available JSON functions.

How to update a jsonb column and put text array

I have jsonb column like so:
{name: "Toby", occupation: "Software Engineer", interests: ""}
Now, I need to update the row and put a text array like ['Volleyball', 'Football', 'Swim'] into interests field.
What I've tried so far:
UPDATE users SET data = jsonb_set(data, '{interests}', ARRAY['Volleyball', 'Football', 'Swim'], true) WHERE id=84;
data is the jsonb column
But it returns an error:
ERROR: function jsonb_set(jsonb, unknown, integer[], boolean) does not
exist
Hint: No function matches the given name and argument types. You
might need to add explicit type casts.
P.S:
I'm using PostgreSQL 10
The third argument needs to be of JSONB type too.
UPDATE users SET data = jsonb_set(data, '{interests}', '["Volleyball", "Football", "Swim"]'::jsonb, true) WHERE id=84;
This will also work, which is a little closer to your example using ARRAY:
UPDATE users SET data = jsonb_set(data, '{interests}', to_jsonb(array['Volleyball', 'Football', 'Swim']), true) WHERE id=84

Resources