Display JSON array elements as one line in U-SQL - arrays

How do I display each JSON array element as a comma separated element in one line, rather than one element per line, in U-SQL?
For example, the JSON file is:
{
"A": {
"A1": "1",
"A2": 0
},
"B": {
"B1": "1",
"B2": 0
},
"C": {
"C1": [
{
"D1": "1"
},
{
"D2": "2"
},
{
"D3": "3"
},
{
"D4": "4"
},
{
"D5": "5"
},
{
"D6": "6"
},
{
"D7": "7"
}
]
}
}
The code to process this fragment for the array C1 is as follows:
#sql = SELECT
Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(C)["C1"] AS C1_array
FROM #json;
OUTPUT #sql TO "test.txt" USING Outputters.Csv(quoting: false);
#sql2 = SELECT
Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(C1_array) AS C1
FROM #sql
CROSS APPLY
EXPLODE (Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(C1_array).Values) AS D(C1);
#result =
SELECT C1["D1"]AS D1,
C1["D2"] AS D2,
C1["D3"]AS D3,
C1["D4"]AS D4,
C1["D5"]AS D5,
C1["D6"]AS D6,
C1["D7"]AS D7,
FROM #sql2;
OUTPUT #result TO "output.txt" USING Outputters.Text();
The result that all the array elements print out as one per line, ie, all the D1 through D7 elements are on separate lines. I want the D1 through D7 elements to be part of the same line, as it is part of the JSON object.
That is:
1, 2, 3, 4, 5, 6, 7
How can this be done?

The important part is that the C1 array contains one item per Di. So if you treat it as an item per row, you will get separate rows. In this case you want one row for all of C1.
The following does this in two ways: One time you know what the Ds are and one time if you do not know and still want them in one row (now all in one cell).
REFERENCE ASSEMBLY JSONBlog.[Newtonsoft.Json];
REFERENCE ASSEMBLY JSONBlog.[Microsoft.Analytics.Samples.Formats];
// Get one row per C and get the C1 array as column
#d = EXTRACT C1 string FROM "/Temp/ABCD.txt" USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor("C");
// Keep one row per C and get all the items from within the C1 array
#d =
SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(C1, "[*].*") AS DMap
FROM #d;
// Get individual items
#d1 =
SELECT
DMap["[0].D1"] AS D1,
DMap["[1].D2"] AS D2,
DMap["[2].D3"] AS D3,
DMap["[3].D4"] AS D4,
DMap["[4].D5"] AS D5,
DMap["[5].D6"] AS D6,
DMap["[6].D7"] AS D7
FROM #d;
// Keep it generic and get all item in a single column
#d2 =
SELECT String.Join("\t", DMap.Values) AS Ds
FROM #d;
OUTPUT #d1
TO "/Temp/D-Out1.tsv"
USING Outputters.Tsv();
OUTPUT #d2
TO "/Temp/D-Out2.tsv"
USING Outputters.Tsv(quoting:false);
As you can see, the JsonTuple function can take a JSONPath expression and then it uses all found paths in the resulting map as keys.

Related

SQL Server table data to JSON Path result

I am looking for a solution to convert the table results to a JSON path.
I have a table with two columns as below. Column 1 Will always have normal values, but column 2 will have values up to 15 separated by ';' (semicolon).
ID Column1 Column2
--------------------------------------
1 T1 Re;BoRe;Va
I want to convert the above column data in to below JSON Format
{
"services":
[
{ "service": "T1"}
],
"additional_services":
[
{ "service": "Re" },
{ "service": "BoRe" },
{ "service": "Va" }
]
}
I have tried creating something like the below, but cannot get to the exact format that I am looking for
SELECT
REPLACE((SELECT d.Column1 AS services, d.column2 AS additional_services
FROM Table1 w (nolock)
INNER JOIN Table2 d (nolock) ON w.Id = d.Id
WHERE ID = 1
FOR JSON PATH), '\/', '/')
Please let me know if this is something we can achieve using T-SQL
As I mention in the comments, I strongly recommend you fix your design and normalise your design. Don't store delimited data in your database; Re;BoRe;Va should be 3 rows, not 1 delimited one. That doesn't mean you can't achieve what you want with your denormalised data, just that your design is flawed, and thus it needs being brought up.
One way to achieve what you're after is with some nested FOR JSON calls:
SELECT (SELECT V.Column1 AS service
FOR JSON PATH) AS services,
(SELECT SS.[value] AS service
FROM STRING_SPLIT(V.Column2,';') SS
FOR JSON PATH) AS additional_services
FROM (VALUES(1,'T1','Re;BoRe;Va'))V(ID,Column1,Column2)
FOR JSON PATH, WITHOUT_ARRAY_WRAPPER;
This results in the following JSON:
{
"services": [
{
"service": "T1"
}
],
"additional_services": [
{
"service": "Re"
},
{
"service": "BoRe"
},
{
"service": "Va"
}
]
}

How Can I Calculate the Average of Floats in a Nested Array in a Variant Column

I have a VARIANT column that contains a JSON response from a web service. It contains a nested array with a float value that I would like to aggregate and return as an average. Here is an example SnowSQL command that I am using:
select
value:disambiguated.id,
value:mentions
from TABLE(
FLATTEN(input =>
PARSE_JSON('{ "entities": [{"count": 2,"disambiguated": {"id": 123},"label": "Coronavirus Disease 2019","mentions": [{"confidence": 0.5928,}, {"confidence": 0.5445,}],"type": "MEDICAL"}]}'):entities
)
)
Which returns:
VALUE:DISAMBIGUATED.ID VALUE:MENTIONS
123 [ { "confidence": 0.5928 }, { "confidence": 0.5445 } ]
What I would like to return is something with the two "confidence" values averaged to 0.56825. I was able to add a second FLATTEN statement which isolated the "mentions" array and allowed me to extract each "confidence" value. I can not seem to figure out how to group the records to calculate the average. Would love to use the built in AVG() function if possible. Thank you in advance for any help you can provide.
Using your example, you can use LATERAL FLATTEN to create your required flattened fields, and then aggregate as you normally would. In this example, I'm grouping on the ID that is in the data, but you could also use y.index or z.index depending on which of those you wanted to group on for your AVG().
WITH x AS (
SELECT PARSE_JSON('{ "entities": [{"count": 2,"disambiguated": {"id": 123},"label": "Coronavirus Disease 2019","mentions": [{"confidence": 0.5928,}, {"confidence": 0.5445,}],"type": "MEDICAL"}]}') as json_str
)
SELECT
y.value:disambiguated.id as id,
avg(z.value:confidence)
from x,
LATERAL FLATTEN(input => json_str:entities) y,
LATERAL FLATTEN(input => y.value:mentions) z
GROUP BY id
;

Query JSON Key:Value Pairs in AWS Athena

I have received a data set from a client that is loaded in AWS S3. The data contains unnamed JSON key:value pairs. This isn't my area of expertise, so I was looking for a little help.
The structure of JSON data that I've typically worked with in the past looks similar to this:
{ "name":"John", "age":30, "car":null }
The data that I have received from my client is formatted as such:
{
"answer_id": "cc006",
"answer": {
"101086": 1,
"101087": 2,
"101089": 2,
"101090": 7,
"101091": 5,
"101092": 3,
"101125": 2
}
}
This is survey data, where the key on the left is a numeric customer identifier, and the value on the right is their response to a survey question, i.e. customer "101125" answered the survey with a value of "2". I need to be able to query the JSON data using Athena such that my result set looks similar to:
Cross joining the unnested children against the parent node isn't an issue. What I can't figure out is how to select all of the keys from the array "answer" without specifying that actual key name. Similarly, I want to be able to select all of the values as well.
Is it possible to create a virtual table in Athena that would allow for these results, or do I need to convert the JSON to a format this looks more similar to the following:
{
"answer_id": "cc006",
"answer": [
{ "key": "101086", "value": 1 },
{ "key": "101087", "value": 2 },
{ "key": "101089", "value": 2 },
{ "key": "101090", "value": 7 },
{ "key": "101091", "value": 5 },
{ "key": "101092", "value": 3 },
{ "key": "101125", "value": 2 }
]
}
EDIT 6/4/2020
I was able to use the code that Theon provided below along with the following table structure:
CREATE EXTERNAL TABLE answer_example (
answer_id string,
answer string
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3://mybucket/'
That allowed me to use the following query to generate the results that I needed.
WITH Data AS(
SELECT
answer_id,
CAST(json_extract(answer, '$') AS MAP(VARCHAR, VARCHAR)) as answer
FROM
answer_example
)
SELECT
answer_id,
key,
element_at(answer, key) AS value
FROM
Data
CROSS JOIN UNNEST (map_keys(answer)) AS answer (key)
EDIT 6/5/2020
Taking additional advice from Theon's response below, the following DDL and Query simplify this quite a bit.
DDL:
CREATE EXTERNAL TABLE answer_example (
answer_id string,
answer map<string,string>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3://mybucket/'
Query:
SELECT
answer_id,
key,
element_at(answer, key) AS value
FROM
answer_example
CROSS JOIN UNNEST (map_keys(answer)) AS answer (key)
Cross joining with the keys of the answer property and then picking the corresponding value. Something like this:
WITH data AS (
SELECT
'cc006' AS answer_id,
MAP(
ARRAY['101086', '101087', '101089', '101090', '101091', '101092', '101125'],
ARRAY[1, 2, 2, 7, 5, 3, 2]
) AS answers
)
SELECT
answer_id,
key,
element_at(answers, key) AS value
FROM data
CROSS JOIN UNNEST (map_keys(answers)) AS answer (key)
You could probably do something with transform_keys to create rows of the key value pairs, but the SQL above does the trick.

MariaDB JSON_ARRAY_APPEND Object instead of it's string representation

I am receiving quoted string values in an array instead of an array of objects when I use JSON_ARRAY_APPEND() to insert a string that represents an object.
I need a way to force the value inserted to the array to be the object instead of it's string representation.
Server:
10.2.18-MariaDB-log
MariaDB Server
Linux x86_64
Here is a sample I am trying to get work:
set #NewArrayItem = '{"item2": "value2"}';
SELECT JSON_ARRAY_APPEND('{"SomeData": "SomeValue", "AnArray": [{"item1": "value1"}]}', '$.AnArray', #NewArrayItem ) as outval;
The second element in the array ($.AnArray[1]) is a string instead of an object.
I am expecting:
{"SomeData": "SomeValue", "AnArray": [{"item1": "value1"}, {"item2": "value2"}]}
But I actually get:
{"SomeData": "SomeValue", "AnArray": [{"item1": "value1"}, "{\"item2\": \"value2\"}"]}
I see that the following works, but my constraint is that value #NewArrayItem is a properly formatted string from another application:
SELECT JSON_ARRAY_APPEND('{"SomeData": "SomeValue", "AnArray": [{"item1": "value1"}]}', '$.AnArray', JSON_OBJECT('item2','value2') ) as outval;
I solved this with a combination of JSON_SET, JSON_MERGE, and JSON_QUERY:
set #ExistingData = '{"SomeData": "SomeValue", "AnArray": [{"item1": "value1"}]}';
set #NewArrayItem = '{"item2": "value2"}';
SELECT JSON_SET(#ExistingData, '$.AnArray', JSON_MERGE(ifnull(JSON_QUERY(#ExistingData, '$.AnArray'),'[]'),#NewArrayItem) ) as outval;
As a bonus, it also works for the case where the array does not exist:
set #ExistingData = '{"SomeData": "SomeValue"}';
set #NewArrayItem = '{"item2": "value2"}';
SELECT JSON_SET(#ExistingData, '$.AnArray', JSON_MERGE(ifnull(JSON_QUERY(#ExistingData, '$.AnArray'),'[]'),#NewArrayItem) ) as outval;
Still looking for a more simple answer.
I try this example
SET #json = '[]';
SELECT #json; // []
SET #json = JSON_ARRAY_APPEND(#json, '$', JSON_OBJECT("id", 1, "name", "Month1"));
SELECT #json; // [{"id": 1, "name": "Month1"}]
SET #json = JSON_ARRAY_APPEND(#json, '$', JSON_OBJECT("id", 2, "name", "Month2"));
SELECT #json; // [{"id": 1, "name": "Month1"}, {"id": 2, "name": "Month2"}]

How to delete array element in JSONB column based on nested key value?

How can I remove an object from an array, based on the value of one of the object's keys?
The array is nested within a parent object.
Here's a sample structure:
{
"foo1": [ { "bar1": 123, "bar2": 456 }, { "bar1": 789, "bar2": 42 } ],
"foo2": [ "some other stuff" ]
}
Can I remove an array element based on the value of bar1?
I can query based on the bar1 value using: columnname #> '{ "foo1": [ { "bar1": 123 } ]}', but I've had no luck finding a way to remove { "bar1": 123, "bar2": 456 } from foo1 while keeping everything else intact.
Thanks
Running PostgreSQL 9.6
Assuming that you want to search for a specific object with an inner object of a certain value, and that this specific object can appear anywhere in the array, you need to unpack the document and each of the arrays, test the inner sub-documents for containment and delete as appropriate, then re-assemble the array and the JSON document (untested):
SELECT id, jsonb_build_object(key, jarray)
FROM (
SELECT foo.id, foo.key, jsonb_build_array(bar.value) AS jarray
FROM ( SELECT id, key, value
FROM my_table, jsonb_each(jdoc) ) foo,
jsonb_array_elements(foo.value) AS bar (value)
WHERE NOT bar.value #> '{"bar1": 123}'::jsonb
GROUP BY 1, 2 ) x
GROUP BY 1;
Now, this may seem a little dense, so picked apart you get:
SELECT id, key, value
FROM my_table, jsonb_each(jdoc)
This uses a lateral join on your table to take the JSON document jdoc and turn it into a set of rows foo(id, key, value) where the value contains the array. The id is the primary key of your table.
Then we get:
SELECT foo.id, foo.key, jsonb_build_array(bar.value) AS jarray
FROM foo, -- abbreviated from above
jsonb_array_elements(foo.value) AS bar (value)
WHERE NOT bar.value #> '{"bar1": 123}'::jsonb
GROUP BY 1, 2
This uses another lateral join to unpack the arrays into bar(value) rows. These objects can now be searched with the containment operator to remove the objects from the result set: WHERE NOT bar.value #> '{"bar1": 123}'::jsonb. In the select list the arrays are re-assembled by id and key but now without the offending sub-documents.
Finally, in the main query the JSON documents are re-assembled:
SELECT id, jsonb_build_object(key, jarray)
FROM x -- from above
GROUP BY 1;
The important thing to understand is that PostgreSQL JSON functions only operate on the level of the JSON document that you can explicitly indicate. Usually that is the top level of the document, unless you have an explicit path to some level in the document (like {foo1, 0, bar1}, but you don't have that). At that level of operation you can then unpack to do your processing such as removing objects.

Resources