postgresql json array query - arrays

I tried to query my json array using the example here: How do I query using fields inside the new PostgreSQL JSON datatype?
They use the example:
SELECT *
FROM json_array_elements(
'[{"name": "Toby", "occupation": "Software Engineer"},
{"name": "Zaphod", "occupation": "Galactic President"} ]'
) AS elem
WHERE elem->>'name' = 'Toby';
But my Json array looks more like this (if using the example):
{
"people": [{
"name": "Toby",
"occupation": "Software Engineer"
},
{
"name": "Zaphod",
"occupation": "Galactic President"
}
]
}
But I get an error: ERROR: cannot call json_array_elements on a non-array
Is my Json "array" not really an array? I have to use this Json string because it's contained in a database, so I would have to tell them to fix it if it's not an array.
Or, is there another way to query it?
I read documentation but nothing worked, kept getting errors.

The json array has a key people so use my_json->'people' in the function:
with my_table(my_json) as (
values(
'{
"people": [
{
"name": "Toby",
"occupation": "Software Engineer"
},
{
"name": "Zaphod",
"occupation": "Galactic President"
}
]
}'::json)
)
select t.*
from my_table t
cross join json_array_elements(my_json->'people') elem
where elem->>'name' = 'Toby';
The function json_array_elements() unnests the json array and generates all its elements as rows:
select elem->>'name' as name, elem->>'occupation' as occupation
from my_table
cross join json_array_elements(my_json->'people') elem
name | occupation
--------+--------------------
Toby | Software Engineer
Zaphod | Galactic President
(2 rows)
If you are interested in Toby's occupation:
select elem->>'occupation' as occupation
from my_table
cross join json_array_elements(my_json->'people') elem
where elem->>'name' = 'Toby'
occupation
-------------------
Software Engineer
(1 row)

Related

SQL Server table data to JSON Path result

I am looking for a solution to convert the table results to a JSON path.
I have a table with two columns as below. Column 1 Will always have normal values, but column 2 will have values up to 15 separated by ';' (semicolon).
ID Column1 Column2
--------------------------------------
1 T1 Re;BoRe;Va
I want to convert the above column data in to below JSON Format
{
"services":
[
{ "service": "T1"}
],
"additional_services":
[
{ "service": "Re" },
{ "service": "BoRe" },
{ "service": "Va" }
]
}
I have tried creating something like the below, but cannot get to the exact format that I am looking for
SELECT
REPLACE((SELECT d.Column1 AS services, d.column2 AS additional_services
FROM Table1 w (nolock)
INNER JOIN Table2 d (nolock) ON w.Id = d.Id
WHERE ID = 1
FOR JSON PATH), '\/', '/')
Please let me know if this is something we can achieve using T-SQL
As I mention in the comments, I strongly recommend you fix your design and normalise your design. Don't store delimited data in your database; Re;BoRe;Va should be 3 rows, not 1 delimited one. That doesn't mean you can't achieve what you want with your denormalised data, just that your design is flawed, and thus it needs being brought up.
One way to achieve what you're after is with some nested FOR JSON calls:
SELECT (SELECT V.Column1 AS service
FOR JSON PATH) AS services,
(SELECT SS.[value] AS service
FROM STRING_SPLIT(V.Column2,';') SS
FOR JSON PATH) AS additional_services
FROM (VALUES(1,'T1','Re;BoRe;Va'))V(ID,Column1,Column2)
FOR JSON PATH, WITHOUT_ARRAY_WRAPPER;
This results in the following JSON:
{
"services": [
{
"service": "T1"
}
],
"additional_services": [
{
"service": "Re"
},
{
"service": "BoRe"
},
{
"service": "Va"
}
]
}

Loading JSON into BigQuery: Field is sometimes an array and sometimes a string

I'm trying to load JSON data to BigQuery. The excerpt of my data causing problems looks like this:
[{"Value":"123","Code":"A"},{"Value":"000","Code":"B"}]
{"Value":"456","Code":"A"}
[{"Value":"123","Code":"A"},{"Value":"789","Code":"C"},{"Value":"000","Code":"B"}]
{"Value":"Z","Code":"A"}
I have defined the schema for this field to be:
{
"fields": [
{
"mode": "NULLABLE",
"name": "Code",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "Value",
"type": "STRING"
}
],
"mode": "REPEATED",
"name": "Properties",
"type": "RECORD"
}
But I'm having trouble successfully extracting the string and array values into one repeated field. This SQL will successfully extract the string values:
JSON_EXTRACT_SCALAR(json_string,'$.Properties.Code') as Code,
JSON_EXTRACT_SCALAR(json_string,'$.Properties.Value') as Value
And this SQL will successfully extract the array values:
ARRAY(
SELECT
STRUCT(
JSON_EXTRACT_SCALAR(Properties_Array,'$.Code') AS Code,
JSON_EXTRACT_SCALAR(Properties_Array,'$.Value') AS Value
)
FROM UNNEST(JSON_EXTRACT_ARRAY(json_string,'$.Properties')) Properties_Array)
AS Properties
I am trying to find a way to have BigQuery to read this string as a one element array instead of preprocessing the data. Is this possible in #StandardSQL?
Below example is for BigQuery Standard SQL
#standardSQL
WITH `project.dataset.table` as (
SELECT '{"Properties":[{"Value":"123","Code":"A"},{"Value":"000","Code":"B"}]}' json_string UNION ALL
SELECT '{"Properties":{"Value":"456","Code":"A"}}' UNION ALL
SELECT '{"Properties":[{"Value":"123","Code":"A"},{"Value":"789","Code":"C"},{"Value":"000","Code":"B"}]}' UNION ALL
SELECT '{"Properties": {"Value":"Z","Code":"A"}}'
)
SELECT json_string,
ARRAY(
SELECT STRUCT(
JSON_EXTRACT_SCALAR(Properties,'$.Code') AS Code,
JSON_EXTRACT_SCALAR(Properties,'$.Value') AS Value
)
FROM UNNEST(IFNULL(
JSON_EXTRACT_ARRAY(json_string,'$.Properties'),
[JSON_EXTRACT(json_string,'$.Properties')])) Properties
) AS Properties
FROM `project.dataset.table`
with output

OPENJSON - How to extract value from JSON object saved as NVARCHAR in SQL Server

There is a column RawData of type NVARCHAR which contains JSON object as strings
RawData
-------------------------------------------------------
{"ID":1,--other key/value(s)--,"object":{--object1--}}
{"ID":2,--other key/value(s)--,"object":{--object2--}}
{"ID":3,--other key/value(s)--,"object":{--object3--}}
{"ID":4,--other key/value(s)--,"object":{--object4--}}
{"ID":5,--other key/value(s)--,"object":{--object5--}}
This JSON string is big (1kb) and currently the most used part of this json is object(200 bytes).
i want to extract object part of these json strings by using OPENJSON.and i was not able to achieve a solution but i think there is a solution.
The result that i want is:
RawData
----------------
{--object1--}
{--object2--}
{--object3--}
{--object4--}
{--object5--}
My attempts so far
SELECT *
FROM OPENJSON((SELECT RawData From DATA_TB FOR JSON PATH))
Looks like this should work for you.
Sample data
create table data_tb
(
RawData nvarchar(max)
);
insert into data_tb (RawData) values
('{"ID":1, "key": "value1", "object":{ "name": "alfred" }}'),
('{"ID":2, "key": "value2", "object":{ "name": "bert" }}'),
('{"ID":3, "key": "value3", "object":{ "name": "cecil" }}'),
('{"ID":4, "key": "value4", "object":{ "name": "dominique" }}'),
('{"ID":5, "key": "value5", "object":{ "name": "elise" }}');
Solution
select d.RawData, json_query(d.RawData, '$.object') as Object
from data_tb d;
See it in action: fiddle.
Something like this
SELECT object
FROM DATA_TB as dt
CROSS APPLY
OPENJSON(dt.RawData) with (object nvarchar(max) as json);

Query JSON Key:Value Pairs in AWS Athena

I have received a data set from a client that is loaded in AWS S3. The data contains unnamed JSON key:value pairs. This isn't my area of expertise, so I was looking for a little help.
The structure of JSON data that I've typically worked with in the past looks similar to this:
{ "name":"John", "age":30, "car":null }
The data that I have received from my client is formatted as such:
{
"answer_id": "cc006",
"answer": {
"101086": 1,
"101087": 2,
"101089": 2,
"101090": 7,
"101091": 5,
"101092": 3,
"101125": 2
}
}
This is survey data, where the key on the left is a numeric customer identifier, and the value on the right is their response to a survey question, i.e. customer "101125" answered the survey with a value of "2". I need to be able to query the JSON data using Athena such that my result set looks similar to:
Cross joining the unnested children against the parent node isn't an issue. What I can't figure out is how to select all of the keys from the array "answer" without specifying that actual key name. Similarly, I want to be able to select all of the values as well.
Is it possible to create a virtual table in Athena that would allow for these results, or do I need to convert the JSON to a format this looks more similar to the following:
{
"answer_id": "cc006",
"answer": [
{ "key": "101086", "value": 1 },
{ "key": "101087", "value": 2 },
{ "key": "101089", "value": 2 },
{ "key": "101090", "value": 7 },
{ "key": "101091", "value": 5 },
{ "key": "101092", "value": 3 },
{ "key": "101125", "value": 2 }
]
}
EDIT 6/4/2020
I was able to use the code that Theon provided below along with the following table structure:
CREATE EXTERNAL TABLE answer_example (
answer_id string,
answer string
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3://mybucket/'
That allowed me to use the following query to generate the results that I needed.
WITH Data AS(
SELECT
answer_id,
CAST(json_extract(answer, '$') AS MAP(VARCHAR, VARCHAR)) as answer
FROM
answer_example
)
SELECT
answer_id,
key,
element_at(answer, key) AS value
FROM
Data
CROSS JOIN UNNEST (map_keys(answer)) AS answer (key)
EDIT 6/5/2020
Taking additional advice from Theon's response below, the following DDL and Query simplify this quite a bit.
DDL:
CREATE EXTERNAL TABLE answer_example (
answer_id string,
answer map<string,string>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3://mybucket/'
Query:
SELECT
answer_id,
key,
element_at(answer, key) AS value
FROM
answer_example
CROSS JOIN UNNEST (map_keys(answer)) AS answer (key)
Cross joining with the keys of the answer property and then picking the corresponding value. Something like this:
WITH data AS (
SELECT
'cc006' AS answer_id,
MAP(
ARRAY['101086', '101087', '101089', '101090', '101091', '101092', '101125'],
ARRAY[1, 2, 2, 7, 5, 3, 2]
) AS answers
)
SELECT
answer_id,
key,
element_at(answers, key) AS value
FROM data
CROSS JOIN UNNEST (map_keys(answers)) AS answer (key)
You could probably do something with transform_keys to create rows of the key value pairs, but the SQL above does the trick.

Hive explode with array of struct

I am trying to work out how to explode a complex type in Hive. I have the following Avro file that I want to use for my test and have build a Hive external table over it.
Here is my test data.
{"order_id":123456,"customer_id":987654,"total":305,"order_details":[{"quantity":5,"total":55,"product_detail":{"product_id":1000,"product_name":"Hugo Boss XY","product_description": {"string": "Hugo Xy Men 100 ml"}, "product_status": "AVAILABLE", "product_category":["fragrance","perfume"],"price":10.35,"product_hash":"XY123"}},{"quantity":5,"total":250,"product_detail":{"product_id":2000,"product_name":"Cherokee Polo T Shirt","product_description": {"string": "Cherokee Medium Blue Polo T Shirt"}, "product_status": "AVAILABLE", "product_category":["T-shirts","V-Neck","Cotton", "Medium"],"price":50.00,"product_hash":"XY789"}}]}
{"order_id":789012,"customer_id":4567324,"total":220,"order_details":[{"quantity":10,"total":120,"product_detail":{"product_id":1001,"product_name":"Hugo Men Red","product_description": {"string": "Hugo Men Red 150 ml"}, "product_status": "ONLY_FEW_LEFT", "product_category":["fragrance","perfume"],"price":12.99,"product_hash":"XY456"}},{"quantity":10,"total":100,"product_detail":{"product_id":2001,"product_name":"Ruggers Smart","product_description": {"string": "Ruggers Smart White Small Polo T Shirt"}, "product_status": "ONLY_FEW_LEFT", "product_category":["T-shirts","Round-Neck","Woolen", "Small"],"price":9.99,"product_hash":"XY987"}}]}
Avro schema
{
"namespace":"com.treselle.db.model",
"type":"record",
"doc":"This Schema describes about Order",
"name":"Order",
"fields":[
{"name":"order_id","type": "long"},
{"name":"customer_id","type": "long"},
{"name":"total","type": "float"},
{"name":"order_details","type":{
"type":"array",
"items": {
"namespace":"com.treselle.db.model",
"name":"OrderDetail",
"type":"record",
"fields": [
{"name":"quantity","type": "int"},
{"name":"total","type": "float"},
{"name":"product_detail","type":{
"namespace":"com.treselle.db.model",
"type":"record",
"name":"Product",
"fields":[
{"name":"product_id","type": "long"},
{"name":"product_name","type": "string","doc":"This is the name of the product"},
{"name":"product_description","type": ["string", "null"], "default": ""},
{"name":"product_status","type": {"name":"product_status", "type": "enum", "symbols": ["AVAILABLE", "OUT_OF_STOCK", "ONLY_FEW_LEFT"]}, "default":"AVAILABLE"},
{"name":"product_category","type":{"type": "array", "items": "string"}, "doc": "This contains array of categories"},
{"name":"price","type": "float"},
{"name": "product_hash", "type": {"type": "fixed", "name": "product_hash", "size": 5}}
]
}
}
]
}
}
}
]
}
My Hive DDL
CREATE EXTERNAL TABLE orders (
order_id bigint,
customer_id bigint,
total float,
order_items array<
struct<
quantity:int,
total:float,
product_detail:struct<
product_id:bigint,
product_name:string,
product_description:string,
product_status:string,
product_caretogy:array<string>,
price:float,
product_hash:binary
>
>
>
)
STORED AS AVRO
LOCATION '/user/hive/test/orders';
Queries
SELECT order_id, customer_id FROM orders;
This works fine and returns the results from the 2 rows as expected.
But when I try to use explode with lateral view I hit problems.
SELECT
order_id,
customer_id,
ord_dets.quantity as line_qty,
ord_dets.total as line_total
FROM
orders
LATERAL VIEW explode(order_items) exploded_table as ord_dets;
This query runs okay, but does not produce any results.
Any pointers as to what it wrong here?
The reason is that in your schema you defined order_items but in the data and the avro schema the field is called order_details. Hive looks for order_items and thinks it's a non-existent field and defaults to null.
Thanks for the pointer.
When I corrected that error I got errors at query time...
OK
Failed with exception java.io.IOException:org.apache.avro.AvroTypeException: Found com.treselle.db.model.order_details, expecting union
After further analysis I found both the enum type and the fixed type in the avro file caused the "expecting union" error.
After removing those columns I was able to query the Hive table successfully.

Resources