Snowflake - extract JSON array string object values into pipe separated values - arrays

I have a nested JSON array which is a string object which has been stored into variant type stage table and I want to extract particular string object value and populate with pipe separated values if more than one object found. Can someone help me to achieve the desired output format please.
Sample JSON data
{"issues": [
{
"expand": "",
"fields": {
"customfield_10010": [
"com.atlassian.xxx.yyy.yyyy.Sprint#xyz456[completeDate=2020-07-20T20:19:06.163Z,endDate=2020-07-17T21:48:00.000Z,goal=,id=1234,name=SPR-SPR 8,rapidViewId=239,sequence=1234,startDate=2020-06-27T21:48:00.000Z,state=CLOSED]",
"com.atlassian.xxx.yyy.yyyy.Sprint#abc123[completeDate=<null>,endDate=2020-08-07T20:33:00.000Z,goal=,id=1239,name=SPR-SPR 9,rapidViewId=239,sequence=1239,startDate=2020-07-20T20:33:26.364Z,state=ACTIVE]"
],
"customfield_10011": "obcd",
"customfield_10024": null,
"customfield_10034": null,
"customfield_10035": null,
"customfield_10037": null,
},
"id": "123456",
"key": "SUE-1234",
"self": "xyz"
}]}
I don't have any idea on how to separate the string objects inside an array with snowflake.
By using the below query I can get whole string converted into pipe separated values.
select
a.value:id::number as ISSUE_ID,
a.value:key::varchar as ISSUE_KEY,
array_to_string(a.value:fields.customfield_10010, '|') as CF_10010_Data
from
ABC.VARIANT_TABLE,
lateral flatten( input => payload_json:issues) as a;
But I need to extract particular string object value. Say for example id value such as 1234 & 1239 to be populated as pipe separated as shown below.
ISSUE_ID ISSUE_KEY SPRINT_ID
123456 SUE-1234 1234|1239
Any idea on this to get desired result is much appreciated. Thanks..

It looks like the data within [...] for your sprints are just details about that sprint. I think it would be easiest for you to actually populate a separate sprints table with data on each sprint, and then you can join that table to the Sprint ID values parsed from the API response you showed with issues data.
with
jira_responses as (
select
$1 as id,
$2 as body
from (values
(1, '{"issues":[{"expand":"","fields":{"customfield_10010":["com.atlassian.xxx.yyy.yyyy.Sprint#xyz456[completeDate=2020-07-20T20:19:06.163Z,endDate=2020-07-17T21:48:00.000Z,goal=,id=1234,name=SPR-SPR 8,rapidViewId=239,sequence=1234,startDate=2020-06-27T21:48:00.000Z,state=CLOSED]","com.atlassian.xxx.yyy.yyyy.Sprint#abc123[completeDate=<null>,endDate=2020-08-07T20:33:00.000Z,goal=,id=1239,name=SPR-SPR 9,rapidViewId=239,sequence=1239,startDate=2020-07-20T20:33:26.364Z,state=ACTIVE]"],"customfield_10011":"obcd","customfield_10024":null,"customfield_10034":null,"customfield_10035":null,"customfield_10037":null},"id":"123456","key":"SUE-1234","self":"xyz"}]}')
)
)
select
issues.value:id::integer as issue_id,
issues.value:key::string as issue_key,
get(split(sprints.value::string, '['), 0)::string as sprint_id
from jira_responses,
lateral flatten(input => parse_json(body):issues) issues,
lateral flatten(input => parse_json(issues.value):fields:customfield_10010) sprints
Based on your sample data, the results would look like the following.
See Snowflake reference docs below.
"Querying Semi-structured Data"
PARSE_JSON
FLATTEN
SPLIT
GET

Related

SQL Searching Query inside JSON object array

Consider the below JSON object
[
{
"startdt": "10/13/2021",
"enddt": "10/13/2022",
"customerName1": "John",
"customerName2": "CA"
},
{
"startdt": "10/14/2021",
"enddt": "10/14/2022",
"customerName1": "Jacob",
"customerName2": "NJ"
}
]
This is the value present in a table "CustInfo" in the column "custjson" in Postgress DB. I want to search the data for the field customerName1. I have created the below query but it is searching in the whole object in such a way that if I give customerName1 as "Jacob" it gives the whole array. I want to search only for a particular array and return the same.
SELECT DISTINCT ON(e.id) e.*,
(jsonb_array_elements(e.custjson)->>'customerName1') AS name1
FROM CustInfo e
CROSS JOIN jsonb_array_elements(e.custjson) ej
WHERE value ->> 'customerName1' LIKE '%Jacob%'
Is there a way in which we can only search the "Jacob" customerName1's array instead of whole json?
For eg: if i search for Jacob i should get the following istead of searching the whole JSON
{
"startdt": "10/14/2021",
"enddt": "10/14/2022",
"customerName1": "Jacob",
"customerName2": "NJ"
}
Any help would be greatly helpful
You can use a JSON path expression to find the array element with a matching customer name:
select e.id,
jsonb_path_query_array(e.custjson, '$[*] ? (#.customerName1 like_regex "Jacob")')
from custinfo e
Based on your sample data, this returns:
id | jsonb_path_query_array
---+----------------------------------------------------------------------------------------------------
1 | [{"enddt": "10/14/2022", "startdt": "10/14/2021", "customerName1": "Jacob", "customerName2": "NJ"}]
If you are using an older Postgres version, that doesn't support JSON path queries, you need to unnest and aggregate manually:
select e.id,
(select jsonb_agg(element)
from jsonb_array_elements(e.custjson) as x(element)
where x.element ->> 'customerName1' like '%Jacob%')
from custinfo e
This assumes that custjson is defined with the data type jsonb (which it should be). If not, you need to cast it: custjson::jsonb

Is it possible to parse json by using select statement in Netezza?

I have json data in one of the column of my table and I would like to parse json data by using select statement in Netezza. I am not able to figure it out.
Can you all help me to solve this problem?
Let's say I have TableA and this table has column Customer_detail. data from customer_detail field lookss like this
'{"Customer":[{"id":"1","name":"mike","address":"NYC"}]}'
Now I would like to query id from customer object of customer_detail column.
Thanks in advance.
From NPS 11.1.0.0 onwards, you can parse and use json datatype itself in NPS.
Here's an example
SYSTEM.ADMIN(ADMIN)=> create table jtest(c1 jsonb);
CREATE TABLE
SYSTEM.ADMIN(ADMIN)=> insert into jtest values('{"name": "Joe Smith", "age": 28, "sports": ["football", "volleyball", "soccer"]}');
INSERT 0 1
SYSTEM.ADMIN(ADMIN)=> insert into jtest values('{"name": "Jane Smith", "age": 38, "sports": ["volleyball", "soccer"]}');
INSERT 0 1
SYSTEM.ADMIN(ADMIN)=> select * from jtest;
C1
----------------------------------------------------------------------------------
{"age": 28, "name": "Joe Smith", "sports": ["football", "volleyball", "soccer"]}
{"age": 38, "name": "Jane Smith", "sports": ["volleyball", "soccer"]}
(2 rows)
SYSTEM.ADMIN(ADMIN)=> select c1 -> 'name' from jtest where c1 -> 'age' > 20::jsonb ;
?COLUMN?
--------------
"Joe Smith"
"Jane Smith"
(2 rows)
You can refer to https://www.ibm.com/support/knowledgecenter/SSTNZ3/com.ibm.ips.doc/postgresql/dbuser/r_dbuser_functions_expressions.html for more details as well.
Looking at the comment you put above, something like
select customer_detail::json -> 'Customer' -> 0 -> 'id' as id,
customer_detail::json -> 'Customer' -> 0 -> 'name' as name
from ...
This will parse the text to json during every execution. A more performant would be to convert customer_detail to jsonb datatype
If the NPS version is below 11.1.x then the json handling needs to be done (a) externally as in using sql to get the json data and then processing it outside the database or (b) using UDF - creating a UDF that supports json parsing
E.g -
Using the programming language of choice, process the json external to SQL
import nzpy # install using "python3 -m pip install nzpy"
import os
import json
# assume NZ_USER, NZ_PASSWORD, NZ_DATABASE and NZ_HOST are set
con = nzpy.connect(user=os.environ["NZ_USER"],
password=os.environ["NZ_PASSWORD"], host=os.environ["NZ_HOST"],
database=os.environ["NZ_DATABASE"], port=5480)
with con.cursor() as cur:
cur.execute('select customer_detail from ...')
for customer_detail in cur.fetch_all():
c = json.loads(customer_detail)
print((c['Customer'][0]['name'], c['Customer'][0]['id']))
Or create a UDF that parses json and use that in the SQL query
If none of those are options, and the json is always well formatted (ie. no new lines, only one key called "id" and one key called "name", etc) then a regex may be a way around, though its not recommended since its not a real json parser
select regexp_extract(customer_detail,
'"id"[[:space:]]*:[[:space:]]*"([^"]+)"', 1, 1) as id,
regexp_extract(customer_detail,
'"name"[[:space:]]*:[[:space:]]*"([^"]+)"', 1, 1) as name
....

Create a Nested/Repeating field using SQL in BigQuery which can be queried with dot notation (without UNNEST)

I am trying to build a data structure in BigQuery using SQL which exactly reflects the data structure which I obtain when uploading JSON. This will enable me to query the view using SQL with dot notation instead of having to UNNEST, which I do understand but many of my clients find extremely confusing and unintuitive.
If I build a really simple dummy dataset with a couple of rows and then nest using the ARRAY_AGG(STRUCT([field list])) pattern:
WITH
flat_table AS (
SELECT "BigQuery" AS name, 23 AS user_count, "Data Warehouse" AS data_thing, 5 AS ease_of_use, "Awesome" AS description UNION ALL
SELECT "MySQL" AS name, 12 AS user_count, "Database" AS data_thing, 3 AS ease_of_use, "Solid" AS description
)
SELECT
name, user_count,
ARRAY_AGG(STRUCT(data_thing, ease_of_use, description)) AS attributes
FROM flat_table
GROUP BY name, user_count
Then saving and viewing the schema shows that the attributes field is Type = RECORD and Mode = REPEATED. Schema field names are:
name
user_count
attributes
attributes.data_thing
attributes.ease_of_use
attributes.description
If I look at the COLUMN information in the INFORMATION_SCHEMA.COLUMNS query I can see that the attributes field is_nullable = NO and data_type = ARRAY<STRUCT<data_thing STRING, ease_of_use INT64, description STRING>>
If I want to query this structure I need to use the UNNEST pattern as below:
SELECT
name,
user_count
FROM
nested_table,
UNNEST(attributes)
WHERE
ease_of_use > 3
However when I upload the following JSON representation of the same data to BigQuery with automatic schema detection:
{"attributes":{"description":"Awesome","ease_of_use":5,"data_thing":"Data Warehouse"},"user_count":23,"name":"BigQuery"}
{"attributes":{"description":"Solid","ease_of_use":3,"data_thing":"Database"},"user_count":12,"name":"MySQL"}
The schema looks nearly identical once loaded, except for the attributes field is Mode = NULLABLE (it is still Type = RECORD). The INFORMATION_SCHEMA.COLUMNS shows me that the attributes field is now is_nullable = YES and data_type = STRUCT<data_thing STRING, ease_of_use INT64, description STRING>, i.e. now nullable and not in an array.
However the most interesting thing for me is that I can now query this table using dot notation instead of the UNNEST pattern, so the query above becomes:
SELECT
name,
user_count
FROM
nested_table_json
WHERE
attributes.ease_of_use > 3
Which is arguably easier to read, even in this trivial case. However once we get to more complex data structures with multiple nested fields and multi-level nesting, the UNNEST pattern becomes extremely difficult to write, QA and debug. The dot notation pattern appears to be much more intuitive and scalable.
So my question is: is it possible to build a data structure equivalent to the loaded JSON by writing queries in SQL, enabling us to build Standard SQL queries using dot notation and not requiring complex UNNEST patterns?
If you know that your array_agg will produce one row, you can drop the ARRAY notation like this:
SELECT
name, user_count,
ARRAY_AGG(STRUCT(data_thing, ease_of_use, description))[offset(0)] AS attributes
notice the use of OFFSET(0) this way the returned output will be:
[
{
"name": "BigQuery",
"user_count": "23",
"attributes": {
"data_thing": "Data Warehouse",
"ease_of_use": "5",
"description": "Awesome"
}
}
]
which can be queried using dot notation.
In case you want just to group result in STRUCT, you don't need array_agg.
WITH
flat_table AS (
SELECT "BigQuery" AS name, 23 AS user_count, struct("Data Warehouse" AS data_thing, 5 AS ease_of_use, "Awesome" AS description) as attributes UNION ALL
SELECT "MySQL" AS name, 12 AS user_count, struct("Database" AS data_thing, 3 AS ease_of_use, "Solid" AS description)
)
SELECT
*
FROM flat_table

SQL Server JSON QUERY

I have to write a query to retrieve data in single json object format.
I have mixed columns. few of them are JSON and few of them are numerical values. While retrieving data i should convert each row to single JSON object. My query is giving me single object but it is giving me results with \\ slashes. Can someone help me in reformatting the below query which should exclude the slashes while formatting each row into single JSON object.
(select
(select
p.personReferenceNumber as personReferenceNumber,
personIdentity as personIdentity,
isnull([name],'') as [name],
p.gender as gender,
isnull(p.birthDateHijri,'') as birthDateHijri,
isnull(p.birthDateGregorian,'') as birthDateGregorian,
isnull(p.liveStatus,'') as liveStatus,
isnull(p.nationality,'') as nationality,
isnull(p.specific,'') as specific,
isnull(p.isDeleted,'') as isDeleted,
isnull(p.sessionId,'') as sessionId,
isnull(p.insertedBy,'') as insertedBy,
isnull(p.insertedTimeStamp,'') as insertedTimeStamp,
isnull(p.updatedBy,'') as updatedBy,
isnull(p.updatedTimestamp,'') as updatedTimestamp
FOR JSON PATH, WITHOUT_ARRAY_WRAPPER) AS person
from person p)
Output:
{\"messageId\":\"f616dbd3-1352-404b-939e-5b12f90b57fe\",
\"transactionRowId\":594834948322275328,
\"personId\":\"bebox13\",
\"idType\":2,\"issueDateHijri\":14400203,
\"issueDateGregorian\":\"2019-04-12T03:00:00\",
\"expiryDateHijri\":14400104,
\"expiryDateGregorian\":\"2019-04-12T03:00:00\",
\"issuePlace\":[\"Update\",
\"Update\"]}",
"name":
"{\\\"messageId\\\":\\\"f616dbd3-1352-404b-939e-5b12f90b57fe\\\",
\\\"transactionRowId\\\":594834948322275328,
\\\"firstName\\\":[\\\"Update\\\",\\\"Update\\\"],\\\"secondName\\\":[\\\"Update\\\",\\\"Update\\\"],\\\"thirdName\\\":[\\\"Update\\\",\\\"Update\\\"],\\\"familyName\\\":[\\\"Update\\\",\\\"Update\\\"]}"

How to create a table in Hive with a column of data type array<map<string, string>>

I am trying to create a table which has a complex data type. And the data types are listed below.
array
map
array< map < String,String> >
I am trying to create a data structure of 3 type . Is it ever possible to create in Hive? My table DDL looks like below.
create table complexTest(names array<String>,infoMap map<String,String>, deatils array<map<String,String>>)
row format delimited
fields terminated by '/'
collection items terminated by '|'
map keys terminated by '='
lines terminated by '\n';
And my sample data looks like below.
Abhieet|Test|Complex/Name=abhi|age=31|Sex=male/Name=Test,age=30,Sex=male|Name=Complex,age=30,Sex=female
Whever i am querying the data from the table i am getting the below values
["Abhieet"," Test"," Complex"] {"Name":"abhi","age":"31","Sex":"male"} [{"Name":null,"Test,age":null,"31,Sex":null,"male":null},{"Name":null,"Complex,age":null,"30,Sex":null,"female":null}]
Which is not i am expecting. Could you please help me to find out what should be the DDL if it ever possible for data type array< map < String,String>>
I don't think this is possible using the inbuilt serde. If you know in advance what the values in your maps are going to be, then I think a better way of approaching this would be to convert your input data to JSON, and then use the Hive json serde:
Sample data:
{'Name': ['Abhieet', 'Test', 'Complex'],
'infoMap': {'Sex': 'male', 'Name': 'abhi', 'age': '31'},
'details': [{'Sex': 'male', 'Name': 'Test', 'age': '30'}, {'Sex': 'female', 'Name': 'Complex', 'age': '30'}]
}
Table definition code:
create table complexTest
(
names array<string>,
infomap struct<Name:string,
age:string,
Sex:string>,
details array<struct<Name:string,
age:string,
Sex:string>>
)
row format serde 'org.openx.data.jsonserde.JsonSerDe'
This can be handled with array of structs using the following query.
create table complexStructArray(custID String,nameValuePairs array<struct< key:String, value:String>>) row format delimited fields terminated by '/' collection items terminated by '|' map keys terminated by '=' lines terminated by '\n';
Sample data:
101/Name=Madhavan|age=30
102/Name=Ramkumar|age=31
Though struct allows duplicate key values unlike Map, above query should handle the ask if the data is having unique key values.
select query would give the output as follows.
hive> select * from complexStructArray;
101 [{"key":"Name","value":"Madhavan"},{"key":"age","value":"30"}]
102 [{"key":"Name","value":"Ramkumar"},{"key":"age","value":"31"}]
Sample data: {"Name": ["Abhieet", "Test", "Complex"],"infoMap": {"Sex":"male", "Name":"abhi", "age":31},"details": [{"Sex":"male", "Name":"Test", "age":30}, {"Sex":"female", "Name":"Complex", "age":30}]}
Table definition code:
#hive>
create table complexTest
(names array<string>,infomap struct<Name:string,
age:string,
Sex:string>,details array<struct<Name:string,
age:string,
Sex:string>>)
row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'

Resources