Is it possible to parse json by using select statement in Netezza? - netezza

I have json data in one of the column of my table and I would like to parse json data by using select statement in Netezza. I am not able to figure it out.
Can you all help me to solve this problem?
Let's say I have TableA and this table has column Customer_detail. data from customer_detail field lookss like this
'{"Customer":[{"id":"1","name":"mike","address":"NYC"}]}'
Now I would like to query id from customer object of customer_detail column.
Thanks in advance.

From NPS 11.1.0.0 onwards, you can parse and use json datatype itself in NPS.
Here's an example
SYSTEM.ADMIN(ADMIN)=> create table jtest(c1 jsonb);
CREATE TABLE
SYSTEM.ADMIN(ADMIN)=> insert into jtest values('{"name": "Joe Smith", "age": 28, "sports": ["football", "volleyball", "soccer"]}');
INSERT 0 1
SYSTEM.ADMIN(ADMIN)=> insert into jtest values('{"name": "Jane Smith", "age": 38, "sports": ["volleyball", "soccer"]}');
INSERT 0 1
SYSTEM.ADMIN(ADMIN)=> select * from jtest;
C1
----------------------------------------------------------------------------------
{"age": 28, "name": "Joe Smith", "sports": ["football", "volleyball", "soccer"]}
{"age": 38, "name": "Jane Smith", "sports": ["volleyball", "soccer"]}
(2 rows)
SYSTEM.ADMIN(ADMIN)=> select c1 -> 'name' from jtest where c1 -> 'age' > 20::jsonb ;
?COLUMN?
--------------
"Joe Smith"
"Jane Smith"
(2 rows)
You can refer to https://www.ibm.com/support/knowledgecenter/SSTNZ3/com.ibm.ips.doc/postgresql/dbuser/r_dbuser_functions_expressions.html for more details as well.
Looking at the comment you put above, something like
select customer_detail::json -> 'Customer' -> 0 -> 'id' as id,
customer_detail::json -> 'Customer' -> 0 -> 'name' as name
from ...
This will parse the text to json during every execution. A more performant would be to convert customer_detail to jsonb datatype
If the NPS version is below 11.1.x then the json handling needs to be done (a) externally as in using sql to get the json data and then processing it outside the database or (b) using UDF - creating a UDF that supports json parsing
E.g -
Using the programming language of choice, process the json external to SQL
import nzpy # install using "python3 -m pip install nzpy"
import os
import json
# assume NZ_USER, NZ_PASSWORD, NZ_DATABASE and NZ_HOST are set
con = nzpy.connect(user=os.environ["NZ_USER"],
password=os.environ["NZ_PASSWORD"], host=os.environ["NZ_HOST"],
database=os.environ["NZ_DATABASE"], port=5480)
with con.cursor() as cur:
cur.execute('select customer_detail from ...')
for customer_detail in cur.fetch_all():
c = json.loads(customer_detail)
print((c['Customer'][0]['name'], c['Customer'][0]['id']))
Or create a UDF that parses json and use that in the SQL query
If none of those are options, and the json is always well formatted (ie. no new lines, only one key called "id" and one key called "name", etc) then a regex may be a way around, though its not recommended since its not a real json parser
select regexp_extract(customer_detail,
'"id"[[:space:]]*:[[:space:]]*"([^"]+)"', 1, 1) as id,
regexp_extract(customer_detail,
'"name"[[:space:]]*:[[:space:]]*"([^"]+)"', 1, 1) as name
....

Related

SQL Searching Query inside JSON object array

Consider the below JSON object
[
{
"startdt": "10/13/2021",
"enddt": "10/13/2022",
"customerName1": "John",
"customerName2": "CA"
},
{
"startdt": "10/14/2021",
"enddt": "10/14/2022",
"customerName1": "Jacob",
"customerName2": "NJ"
}
]
This is the value present in a table "CustInfo" in the column "custjson" in Postgress DB. I want to search the data for the field customerName1. I have created the below query but it is searching in the whole object in such a way that if I give customerName1 as "Jacob" it gives the whole array. I want to search only for a particular array and return the same.
SELECT DISTINCT ON(e.id) e.*,
(jsonb_array_elements(e.custjson)->>'customerName1') AS name1
FROM CustInfo e
CROSS JOIN jsonb_array_elements(e.custjson) ej
WHERE value ->> 'customerName1' LIKE '%Jacob%'
Is there a way in which we can only search the "Jacob" customerName1's array instead of whole json?
For eg: if i search for Jacob i should get the following istead of searching the whole JSON
{
"startdt": "10/14/2021",
"enddt": "10/14/2022",
"customerName1": "Jacob",
"customerName2": "NJ"
}
Any help would be greatly helpful
You can use a JSON path expression to find the array element with a matching customer name:
select e.id,
jsonb_path_query_array(e.custjson, '$[*] ? (#.customerName1 like_regex "Jacob")')
from custinfo e
Based on your sample data, this returns:
id | jsonb_path_query_array
---+----------------------------------------------------------------------------------------------------
1 | [{"enddt": "10/14/2022", "startdt": "10/14/2021", "customerName1": "Jacob", "customerName2": "NJ"}]
If you are using an older Postgres version, that doesn't support JSON path queries, you need to unnest and aggregate manually:
select e.id,
(select jsonb_agg(element)
from jsonb_array_elements(e.custjson) as x(element)
where x.element ->> 'customerName1' like '%Jacob%')
from custinfo e
This assumes that custjson is defined with the data type jsonb (which it should be). If not, you need to cast it: custjson::jsonb

Snowflake - extract JSON array string object values into pipe separated values

I have a nested JSON array which is a string object which has been stored into variant type stage table and I want to extract particular string object value and populate with pipe separated values if more than one object found. Can someone help me to achieve the desired output format please.
Sample JSON data
{"issues": [
{
"expand": "",
"fields": {
"customfield_10010": [
"com.atlassian.xxx.yyy.yyyy.Sprint#xyz456[completeDate=2020-07-20T20:19:06.163Z,endDate=2020-07-17T21:48:00.000Z,goal=,id=1234,name=SPR-SPR 8,rapidViewId=239,sequence=1234,startDate=2020-06-27T21:48:00.000Z,state=CLOSED]",
"com.atlassian.xxx.yyy.yyyy.Sprint#abc123[completeDate=<null>,endDate=2020-08-07T20:33:00.000Z,goal=,id=1239,name=SPR-SPR 9,rapidViewId=239,sequence=1239,startDate=2020-07-20T20:33:26.364Z,state=ACTIVE]"
],
"customfield_10011": "obcd",
"customfield_10024": null,
"customfield_10034": null,
"customfield_10035": null,
"customfield_10037": null,
},
"id": "123456",
"key": "SUE-1234",
"self": "xyz"
}]}
I don't have any idea on how to separate the string objects inside an array with snowflake.
By using the below query I can get whole string converted into pipe separated values.
select
a.value:id::number as ISSUE_ID,
a.value:key::varchar as ISSUE_KEY,
array_to_string(a.value:fields.customfield_10010, '|') as CF_10010_Data
from
ABC.VARIANT_TABLE,
lateral flatten( input => payload_json:issues) as a;
But I need to extract particular string object value. Say for example id value such as 1234 & 1239 to be populated as pipe separated as shown below.
ISSUE_ID ISSUE_KEY SPRINT_ID
123456 SUE-1234 1234|1239
Any idea on this to get desired result is much appreciated. Thanks..
It looks like the data within [...] for your sprints are just details about that sprint. I think it would be easiest for you to actually populate a separate sprints table with data on each sprint, and then you can join that table to the Sprint ID values parsed from the API response you showed with issues data.
with
jira_responses as (
select
$1 as id,
$2 as body
from (values
(1, '{"issues":[{"expand":"","fields":{"customfield_10010":["com.atlassian.xxx.yyy.yyyy.Sprint#xyz456[completeDate=2020-07-20T20:19:06.163Z,endDate=2020-07-17T21:48:00.000Z,goal=,id=1234,name=SPR-SPR 8,rapidViewId=239,sequence=1234,startDate=2020-06-27T21:48:00.000Z,state=CLOSED]","com.atlassian.xxx.yyy.yyyy.Sprint#abc123[completeDate=<null>,endDate=2020-08-07T20:33:00.000Z,goal=,id=1239,name=SPR-SPR 9,rapidViewId=239,sequence=1239,startDate=2020-07-20T20:33:26.364Z,state=ACTIVE]"],"customfield_10011":"obcd","customfield_10024":null,"customfield_10034":null,"customfield_10035":null,"customfield_10037":null},"id":"123456","key":"SUE-1234","self":"xyz"}]}')
)
)
select
issues.value:id::integer as issue_id,
issues.value:key::string as issue_key,
get(split(sprints.value::string, '['), 0)::string as sprint_id
from jira_responses,
lateral flatten(input => parse_json(body):issues) issues,
lateral flatten(input => parse_json(issues.value):fields:customfield_10010) sprints
Based on your sample data, the results would look like the following.
See Snowflake reference docs below.
"Querying Semi-structured Data"
PARSE_JSON
FLATTEN
SPLIT
GET

Create a Nested/Repeating field using SQL in BigQuery which can be queried with dot notation (without UNNEST)

I am trying to build a data structure in BigQuery using SQL which exactly reflects the data structure which I obtain when uploading JSON. This will enable me to query the view using SQL with dot notation instead of having to UNNEST, which I do understand but many of my clients find extremely confusing and unintuitive.
If I build a really simple dummy dataset with a couple of rows and then nest using the ARRAY_AGG(STRUCT([field list])) pattern:
WITH
flat_table AS (
SELECT "BigQuery" AS name, 23 AS user_count, "Data Warehouse" AS data_thing, 5 AS ease_of_use, "Awesome" AS description UNION ALL
SELECT "MySQL" AS name, 12 AS user_count, "Database" AS data_thing, 3 AS ease_of_use, "Solid" AS description
)
SELECT
name, user_count,
ARRAY_AGG(STRUCT(data_thing, ease_of_use, description)) AS attributes
FROM flat_table
GROUP BY name, user_count
Then saving and viewing the schema shows that the attributes field is Type = RECORD and Mode = REPEATED. Schema field names are:
name
user_count
attributes
attributes.data_thing
attributes.ease_of_use
attributes.description
If I look at the COLUMN information in the INFORMATION_SCHEMA.COLUMNS query I can see that the attributes field is_nullable = NO and data_type = ARRAY<STRUCT<data_thing STRING, ease_of_use INT64, description STRING>>
If I want to query this structure I need to use the UNNEST pattern as below:
SELECT
name,
user_count
FROM
nested_table,
UNNEST(attributes)
WHERE
ease_of_use > 3
However when I upload the following JSON representation of the same data to BigQuery with automatic schema detection:
{"attributes":{"description":"Awesome","ease_of_use":5,"data_thing":"Data Warehouse"},"user_count":23,"name":"BigQuery"}
{"attributes":{"description":"Solid","ease_of_use":3,"data_thing":"Database"},"user_count":12,"name":"MySQL"}
The schema looks nearly identical once loaded, except for the attributes field is Mode = NULLABLE (it is still Type = RECORD). The INFORMATION_SCHEMA.COLUMNS shows me that the attributes field is now is_nullable = YES and data_type = STRUCT<data_thing STRING, ease_of_use INT64, description STRING>, i.e. now nullable and not in an array.
However the most interesting thing for me is that I can now query this table using dot notation instead of the UNNEST pattern, so the query above becomes:
SELECT
name,
user_count
FROM
nested_table_json
WHERE
attributes.ease_of_use > 3
Which is arguably easier to read, even in this trivial case. However once we get to more complex data structures with multiple nested fields and multi-level nesting, the UNNEST pattern becomes extremely difficult to write, QA and debug. The dot notation pattern appears to be much more intuitive and scalable.
So my question is: is it possible to build a data structure equivalent to the loaded JSON by writing queries in SQL, enabling us to build Standard SQL queries using dot notation and not requiring complex UNNEST patterns?
If you know that your array_agg will produce one row, you can drop the ARRAY notation like this:
SELECT
name, user_count,
ARRAY_AGG(STRUCT(data_thing, ease_of_use, description))[offset(0)] AS attributes
notice the use of OFFSET(0) this way the returned output will be:
[
{
"name": "BigQuery",
"user_count": "23",
"attributes": {
"data_thing": "Data Warehouse",
"ease_of_use": "5",
"description": "Awesome"
}
}
]
which can be queried using dot notation.
In case you want just to group result in STRUCT, you don't need array_agg.
WITH
flat_table AS (
SELECT "BigQuery" AS name, 23 AS user_count, struct("Data Warehouse" AS data_thing, 5 AS ease_of_use, "Awesome" AS description) as attributes UNION ALL
SELECT "MySQL" AS name, 12 AS user_count, struct("Database" AS data_thing, 3 AS ease_of_use, "Solid" AS description)
)
SELECT
*
FROM flat_table

How to import documents that have arrays with the Cosmos DB Data Migration Tool

I'm trying to import documents from a SQL Server database. Each document will have a list of products that a customer has bought, for example:
{
"name": "John Smith"
"products": [
{
"name": "Pencil Sharpener"
"description": "Something, this and that."
},
{
"name": "Pencil case"
"description": "A case for pencils."
}
]
}
In the SQL Server database, the customer and products are stored in separate tables with a one-to-many relationship between the customer and products:
Customer
Id INT
Name VARCHAR
Product
Id INT
CustomerId INT (FK)
Name VARCHAR
Description VARCHAR
I've checked through the documentation , but can't see any mention of how to write the SQL query to map the one-to-many relationships to a single document.
I think there may be a way to do it as on the Target Information step (and when selecting DocumentDB - Bulk import (single partition collections)) there's the option to provide a bulk insert stored procedure. Maybe the products can be assigned to the document's products array from within there. I'm just not sure how to go about doing it as I'm new to Cosmos DB.
I hope that's clear enough and thanks in advance for your help!
It seems that you’d like to return products info formatted as json when you import data from SQL Server using the Azure Cosmos DB: DocumentDB API Data Migration tool. Based on your customer and products table structure and your requirement, I do the following test, which works fine on my side. You can refer to it.
Import data from SQL Server to JSON file
Query
select distinct c.Name, (SELECT p.Name as [name], p.[Description] as [description] from [dbo].[Product] p where c.Id = p.CustomerId for JSON path) as products
from [dbo].[Customer] c
JSON output
[
{
"Name": "Jack",
"products": null
},
{
"Name": "John Smith",
"products": "[{\"name\":\"Pencil Sharpener\",\"description\":\"Something, this and that\"},{\"name\":\"Pencil case\",\"description\":\"A case for pencils.\"}]"
}
]
Parsing the products
On the 'Target Information' step, you'll need to use your own version of BulkTransformationInsert.js. On line 32 is a 'transformDocument' function where you can edit the document. The following will parse the products and then assign them back to document before returning;
function transformDocument(doc) {
if (doc["products"]) {
let products = doc["products"];
let productsArr = JSON.parse(products);
doc["products"] = productsArr;
}
return doc;
}

How to Update Doc in Cloudant no sql-db

I am following Link to integrate cloudant no sql-db.
There are methods given create DB, Find all, count, search, update. Now I want to update one key value in one of my DB doc file. how Can i achieve that. Document shows like
updateDoc (name, doc)
Arguments:
name - database name
docID - document to update
but when i pass my database name and doc ID its throwing database already created can not create db. But i wanted to updated doc. So can anyone help me out.
Below is one of the doc of may table 'employee_table' for reference -
{
"_id": "0b6459f8d368db408140ddc09bb30d19",
"_rev": "1-6fe6413eef59d0b9c5ab5344dc642bb1",
"Reporting_Manager": "sdasd",
"Designation": "asdasd",
"Access_Level": 2,
"Employee_ID": 123123,
"Employee_Name": "Suhas",
"Project_Name": "asdasd",
"Password": "asda",
"Location": "asdasd",
"Project_Manager": "asdas"
}
So I want to update some values from above doc file of my table 'employee_table'. So what parameters I have to pass to update.
first of all , there is no concept named table in no sql world.
second, to update document first you need to get document based on any input field of document. you can also use Employee_ID or some other document field. then use database.get_query_result
db_name = 'Employee'
database = client.create_database(db_name,throw_on_exists=False)
EmployeeIDValue = '123123'
#here throw_on_exists=False meaning dont throw error if DB already present
def updateDoc (database, EmployeeIDValue):
results = database.get_query_result(
selector= { 'Employee_ID': {'$eq': EmployeeIDValue} }, )
for result in results:
my_doc_id = result["_id"]
my_doc = database[my_doc_id] #===> this give you your document.
'''Now you can do update'''
my_doc['Employee_Name'] = 'XYZ'
my_doc.save() ====> this command updates current document

Resources