SQL Server JSON QUERY - sql-server

I have to write a query to retrieve data in single json object format.
I have mixed columns. few of them are JSON and few of them are numerical values. While retrieving data i should convert each row to single JSON object. My query is giving me single object but it is giving me results with \\ slashes. Can someone help me in reformatting the below query which should exclude the slashes while formatting each row into single JSON object.
(select
(select
p.personReferenceNumber as personReferenceNumber,
personIdentity as personIdentity,
isnull([name],'') as [name],
p.gender as gender,
isnull(p.birthDateHijri,'') as birthDateHijri,
isnull(p.birthDateGregorian,'') as birthDateGregorian,
isnull(p.liveStatus,'') as liveStatus,
isnull(p.nationality,'') as nationality,
isnull(p.specific,'') as specific,
isnull(p.isDeleted,'') as isDeleted,
isnull(p.sessionId,'') as sessionId,
isnull(p.insertedBy,'') as insertedBy,
isnull(p.insertedTimeStamp,'') as insertedTimeStamp,
isnull(p.updatedBy,'') as updatedBy,
isnull(p.updatedTimestamp,'') as updatedTimestamp
FOR JSON PATH, WITHOUT_ARRAY_WRAPPER) AS person
from person p)
Output:
{\"messageId\":\"f616dbd3-1352-404b-939e-5b12f90b57fe\",
\"transactionRowId\":594834948322275328,
\"personId\":\"bebox13\",
\"idType\":2,\"issueDateHijri\":14400203,
\"issueDateGregorian\":\"2019-04-12T03:00:00\",
\"expiryDateHijri\":14400104,
\"expiryDateGregorian\":\"2019-04-12T03:00:00\",
\"issuePlace\":[\"Update\",
\"Update\"]}",
"name":
"{\\\"messageId\\\":\\\"f616dbd3-1352-404b-939e-5b12f90b57fe\\\",
\\\"transactionRowId\\\":594834948322275328,
\\\"firstName\\\":[\\\"Update\\\",\\\"Update\\\"],\\\"secondName\\\":[\\\"Update\\\",\\\"Update\\\"],\\\"thirdName\\\":[\\\"Update\\\",\\\"Update\\\"],\\\"familyName\\\":[\\\"Update\\\",\\\"Update\\\"]}"

Related

Snowflake - extract JSON array string object values into pipe separated values

I have a nested JSON array which is a string object which has been stored into variant type stage table and I want to extract particular string object value and populate with pipe separated values if more than one object found. Can someone help me to achieve the desired output format please.
Sample JSON data
{"issues": [
{
"expand": "",
"fields": {
"customfield_10010": [
"com.atlassian.xxx.yyy.yyyy.Sprint#xyz456[completeDate=2020-07-20T20:19:06.163Z,endDate=2020-07-17T21:48:00.000Z,goal=,id=1234,name=SPR-SPR 8,rapidViewId=239,sequence=1234,startDate=2020-06-27T21:48:00.000Z,state=CLOSED]",
"com.atlassian.xxx.yyy.yyyy.Sprint#abc123[completeDate=<null>,endDate=2020-08-07T20:33:00.000Z,goal=,id=1239,name=SPR-SPR 9,rapidViewId=239,sequence=1239,startDate=2020-07-20T20:33:26.364Z,state=ACTIVE]"
],
"customfield_10011": "obcd",
"customfield_10024": null,
"customfield_10034": null,
"customfield_10035": null,
"customfield_10037": null,
},
"id": "123456",
"key": "SUE-1234",
"self": "xyz"
}]}
I don't have any idea on how to separate the string objects inside an array with snowflake.
By using the below query I can get whole string converted into pipe separated values.
select
a.value:id::number as ISSUE_ID,
a.value:key::varchar as ISSUE_KEY,
array_to_string(a.value:fields.customfield_10010, '|') as CF_10010_Data
from
ABC.VARIANT_TABLE,
lateral flatten( input => payload_json:issues) as a;
But I need to extract particular string object value. Say for example id value such as 1234 & 1239 to be populated as pipe separated as shown below.
ISSUE_ID ISSUE_KEY SPRINT_ID
123456 SUE-1234 1234|1239
Any idea on this to get desired result is much appreciated. Thanks..
It looks like the data within [...] for your sprints are just details about that sprint. I think it would be easiest for you to actually populate a separate sprints table with data on each sprint, and then you can join that table to the Sprint ID values parsed from the API response you showed with issues data.
with
jira_responses as (
select
$1 as id,
$2 as body
from (values
(1, '{"issues":[{"expand":"","fields":{"customfield_10010":["com.atlassian.xxx.yyy.yyyy.Sprint#xyz456[completeDate=2020-07-20T20:19:06.163Z,endDate=2020-07-17T21:48:00.000Z,goal=,id=1234,name=SPR-SPR 8,rapidViewId=239,sequence=1234,startDate=2020-06-27T21:48:00.000Z,state=CLOSED]","com.atlassian.xxx.yyy.yyyy.Sprint#abc123[completeDate=<null>,endDate=2020-08-07T20:33:00.000Z,goal=,id=1239,name=SPR-SPR 9,rapidViewId=239,sequence=1239,startDate=2020-07-20T20:33:26.364Z,state=ACTIVE]"],"customfield_10011":"obcd","customfield_10024":null,"customfield_10034":null,"customfield_10035":null,"customfield_10037":null},"id":"123456","key":"SUE-1234","self":"xyz"}]}')
)
)
select
issues.value:id::integer as issue_id,
issues.value:key::string as issue_key,
get(split(sprints.value::string, '['), 0)::string as sprint_id
from jira_responses,
lateral flatten(input => parse_json(body):issues) issues,
lateral flatten(input => parse_json(issues.value):fields:customfield_10010) sprints
Based on your sample data, the results would look like the following.
See Snowflake reference docs below.
"Querying Semi-structured Data"
PARSE_JSON
FLATTEN
SPLIT
GET

Create a Nested/Repeating field using SQL in BigQuery which can be queried with dot notation (without UNNEST)

I am trying to build a data structure in BigQuery using SQL which exactly reflects the data structure which I obtain when uploading JSON. This will enable me to query the view using SQL with dot notation instead of having to UNNEST, which I do understand but many of my clients find extremely confusing and unintuitive.
If I build a really simple dummy dataset with a couple of rows and then nest using the ARRAY_AGG(STRUCT([field list])) pattern:
WITH
flat_table AS (
SELECT "BigQuery" AS name, 23 AS user_count, "Data Warehouse" AS data_thing, 5 AS ease_of_use, "Awesome" AS description UNION ALL
SELECT "MySQL" AS name, 12 AS user_count, "Database" AS data_thing, 3 AS ease_of_use, "Solid" AS description
)
SELECT
name, user_count,
ARRAY_AGG(STRUCT(data_thing, ease_of_use, description)) AS attributes
FROM flat_table
GROUP BY name, user_count
Then saving and viewing the schema shows that the attributes field is Type = RECORD and Mode = REPEATED. Schema field names are:
name
user_count
attributes
attributes.data_thing
attributes.ease_of_use
attributes.description
If I look at the COLUMN information in the INFORMATION_SCHEMA.COLUMNS query I can see that the attributes field is_nullable = NO and data_type = ARRAY<STRUCT<data_thing STRING, ease_of_use INT64, description STRING>>
If I want to query this structure I need to use the UNNEST pattern as below:
SELECT
name,
user_count
FROM
nested_table,
UNNEST(attributes)
WHERE
ease_of_use > 3
However when I upload the following JSON representation of the same data to BigQuery with automatic schema detection:
{"attributes":{"description":"Awesome","ease_of_use":5,"data_thing":"Data Warehouse"},"user_count":23,"name":"BigQuery"}
{"attributes":{"description":"Solid","ease_of_use":3,"data_thing":"Database"},"user_count":12,"name":"MySQL"}
The schema looks nearly identical once loaded, except for the attributes field is Mode = NULLABLE (it is still Type = RECORD). The INFORMATION_SCHEMA.COLUMNS shows me that the attributes field is now is_nullable = YES and data_type = STRUCT<data_thing STRING, ease_of_use INT64, description STRING>, i.e. now nullable and not in an array.
However the most interesting thing for me is that I can now query this table using dot notation instead of the UNNEST pattern, so the query above becomes:
SELECT
name,
user_count
FROM
nested_table_json
WHERE
attributes.ease_of_use > 3
Which is arguably easier to read, even in this trivial case. However once we get to more complex data structures with multiple nested fields and multi-level nesting, the UNNEST pattern becomes extremely difficult to write, QA and debug. The dot notation pattern appears to be much more intuitive and scalable.
So my question is: is it possible to build a data structure equivalent to the loaded JSON by writing queries in SQL, enabling us to build Standard SQL queries using dot notation and not requiring complex UNNEST patterns?
If you know that your array_agg will produce one row, you can drop the ARRAY notation like this:
SELECT
name, user_count,
ARRAY_AGG(STRUCT(data_thing, ease_of_use, description))[offset(0)] AS attributes
notice the use of OFFSET(0) this way the returned output will be:
[
{
"name": "BigQuery",
"user_count": "23",
"attributes": {
"data_thing": "Data Warehouse",
"ease_of_use": "5",
"description": "Awesome"
}
}
]
which can be queried using dot notation.
In case you want just to group result in STRUCT, you don't need array_agg.
WITH
flat_table AS (
SELECT "BigQuery" AS name, 23 AS user_count, struct("Data Warehouse" AS data_thing, 5 AS ease_of_use, "Awesome" AS description) as attributes UNION ALL
SELECT "MySQL" AS name, 12 AS user_count, struct("Database" AS data_thing, 3 AS ease_of_use, "Solid" AS description)
)
SELECT
*
FROM flat_table

Is there a way to return either a string or embedded JSON using FOR JSON?

I have a nvarchar column that I would like to return embedded in my JSON results if the contents is valid JSON, or as a string otherwise.
Here is what I've tried:
select
(
case when IsJson(Arguments) = 1 then
Json_Query(Arguments)
else
Arguments
end
) Results
from Unit
for json path
This always puts Results into a string.
The following works, but only if the attribute contains valid JSON:
select
(
Json_Query(
case when IsJson(Arguments) = 1 then
Arguments
else
'"' + String_escape(IsNull(Arguments, ''), 'json') + '"' end
)
) Results
from Unit
for json path
If Arguments does not contain a JSON object a runtime error occurs.
Update: Sample data:
Arguments
---------
{ "a": "b" }
Some text
Update: any version of SQL Server will do. I'd even be happy to know that it's coming in a beta or something.
I did not find a good solution and would be happy, if someone comes around with a better one than this hack:
DECLARE #tbl TABLE(ID INT IDENTITY,Arguments NVARCHAR(MAX));
INSERT INTO #tbl VALUES
(NULL)
,('plain text')
,('[{"id":"1"},{"id":"2"}]');
SELECT t1.ID
,(SELECT Arguments FROM #tbl t2 WHERE t2.ID=t1.ID AND ISJSON(Arguments)=0) Arguments
,(SELECT JSON_QUERY(Arguments) FROM #tbl t2 WHERE t2.ID=t1.ID AND ISJSON(Arguments)=1) ArgumentsJSON
FROM #tbl t1
FOR JSON PATH;
As NULL-values are omitted, you will always find eiter Arguments or ArgumentsJSON in your final result. Treating this JSON as NVARCHAR(MAX) you can use REPLACE to rename all to the same Arguments.
The problem seems to be, that you cannot include two columns with the same name within your SELECT, but each column must have a predictable type. This depends on the order you use in CASE (or COALESCE). If the engine thinks "Okay, here's text", all will be treated as text and your JSON is escaped. But if the engine thinks "Okay, some JSON", everything is handled as JSON and will break if this JSON is not valid.
With FOR XML PATH there are some tricks with column namig (such as [*], [node()] or even twice the same within one query), but FOR JSON PATH is not that powerfull...
When you say that your statement "... always puts Results into a string.", you probably mean that when JSON is stored in a text column, FOR JSON escapes this text. Of course, if you want to return an unescaped JSON text, you need to use JSON_QUERY function only for your valid JSON text.
Next is a small workaround (based on FOR JSON and string manipulation), that may help to solve your problem.
Table:
CREATE TABLE #Data (
Arguments nvarchar(max)
)
INSERT INTO #Data
(Arguments)
VALUES
('{"a": "b"}'),
('Some text'),
('{"c": "d"}'),
('{"e": "f"}'),
('More[]text')
Statement:
SELECT CONCAT(N'[', j1.JsonOutput, N',', j2.JsonOutput, N']')
FROM
(
SELECT JSON_QUERY(Arguments) AS Results
FROM #Data
WHERE ISJSON(Arguments) = 1
FOR JSON PATH, WITHOUT_ARRAY_WRAPPER
) j1 (JsonOutput),
(
SELECT STRING_ESCAPE(ISNULL(Arguments, ''), 'json') AS Results
FROM #Data
WHERE ISJSON(Arguments) = 0
FOR JSON PATH, WITHOUT_ARRAY_WRAPPER
) j2 (JsonOutput)
Output:
[{"Results":{"a": "b"}},{"Results":{"c": "d"}},{"Results":{"e": "f"}},{"Results":"Some text"},{"Results":"More[]text"}]
Notes:
One disadvantage here is that the order of the items in the generated output is not the same as in the table.

BigQuery or SQL Server SPLIT query

I have searched around and can not find much on this topic. I have a table, that gets logging information. As a result the column I am interested in contains multiple values that I need to search against. The column is formatted in a php URL style. i.e.
/test/test.aspx?DS_Vendor=55039&DS_ProdVer=7.90.100.0&DS_ProdLang=EN&DS_Product=MTT&DS_OfficeBits=32
This makes all searches end up with really long regexes to get data. Then join statements to combine data.
Is there a way in BigQuery, or SQL Server that I can pull the information from that column and put it into new columns?
Example:
The information I would like extracted begins after the ?, and ends at &, The string can sometimes be longer, and contains additional headers.
Thanks,
Below is for BigQuery Standard SQL and addresses below aspect of your question
Is there a way in BigQuery, ... that I can pull the information from that column and put it into new columns?
#standardSQL
CREATE TEMP FUNCTION parseColumn(kv STRING, column_name STRING) AS (
IF(SPLIT(kv, '=')[OFFSET(0)]= column_name, SPLIT(kv, '=')[OFFSET(1)], NULL)
);
WITH `project.dataset.table` AS (
SELECT '/test/test.aspx?extra=abc&DS_Vendor=55039&DS_ProdVer=7.90.100.0&DS_ProdLang=EN&DS_Product=MTT&DS_OfficeBits=32' AS url UNION ALL
SELECT '/test/test.aspx?DS_Vendor=55192&DS_ProdVer=4.30.100.0&more=123&DS_ProdLang=DE&DS_Product=MTE&DS_OfficeBits=64'
)
SELECT
MIN(parseColumn(kv, 'DS_Vendor')) AS DS_Vendor,
MIN(parseColumn(kv, 'DS_ProdVer')) AS DS_ProdVer,
MIN(parseColumn(kv, 'DS_ProdLang')) AS DS_ProdLang,
MIN(parseColumn(kv, 'DS_Product')) AS DS_Product,
MIN(parseColumn(kv, 'DS_OfficeBits')) AS DS_OfficeBits
FROM `project.dataset.table`,
UNNEST(REGEXP_EXTRACT_ALL(url, r'[?&]([^?&]+)')) AS kv
GROUP BY url
with the result as below
Row DS_Vendor DS_ProdVer DS_ProdLang DS_Product DS_OfficeBits
1 55039 7.90.100.0 EN MTT 32
2 55192 4.30.100.0 DE MTE 64
Below is also addressed
The string can sometimes be longer, and contains additional headers.
One example using BigQuery (with standard SQL):
SELECT REGEXP_EXTRACT_ALL(url, r'[?&]([^?&]+)')
FROM (
SELECT '/test/test.aspx?DS_Vendor=55039&DS_ProdVer=7.90.100.0&DS_ProdLang=EN&DS_Product=MTT&DS_OfficeBits=32' AS url
)
This returns the parts of the URL as an ARRAY<STRING>. To go one step further, you can get back an ARRAY<STRUCT<key STRING, value STRING>> with a query of this form:
SELECT
ARRAY(
SELECT AS STRUCT
SPLIT(part, '=')[OFFSET(0)] AS key,
SPLIT(part, '=')[OFFSET(1)] AS value
FROM UNNEST(REGEXP_EXTRACT_ALL(url, r'[?&]([^?&]+)')) AS part
) AS keys_and_values
FROM (
SELECT '/test/test.aspx?DS_Vendor=55039&DS_ProdVer=7.90.100.0&DS_ProdLang=EN&DS_Product=MTT&DS_OfficeBits=32' AS url
)
...or with the keys and values as top-level columns:
SELECT
SPLIT(part, '=')[OFFSET(0)] AS key,
SPLIT(part, '=')[OFFSET(1)] AS value
FROM (
SELECT '/test/test.aspx?DS_Vendor=55039&DS_ProdVer=7.90.100.0&DS_ProdLang=EN&DS_Product=MTT&DS_OfficeBits=32' AS url
)
CROSS JOIN UNNEST(REGEXP_EXTRACT_ALL(url, r'[?&]([^?&]+)')) AS part

SQL Server Querying An XML Field

I have a table that contains some meta data in an XML field.
For example
<Meta>
<From>tst#test.com</From>
<To>
<Address>testing#123.com</Address>
<Address>2#2.com</Address>
</To>
<Subject>ESubject Goes Here</Subject>
</Meta>
I want to then be able to query this field to return the following results
From To Subject
tst#test.com testing#123.com Subject Goes Here
tst#test.com 2#2.com Subject Goes Here
I've written the following query
SELECT
MetaData.query('data(/Meta/From)') AS [From],
MetaData.query('data(/Meta/To/Address)') AS [To],
MetaData.query('data(/Meta/Subject)') AS [Subject]
FROM
Documents
However this only returns one record for that XML field. It combines both the 2 addresses into one result. Is it possible for split these on to separate records?
The result I'm getting is
From To Subject
tst#test.com testing#123.com 2#2.com Subject Goes Here
Thanks
Gav
You need to return the XML and then parse it using something like the following code:
StringReader stream = new StringReader(stringFromSQL);
XmlReader reader = XmlReader.Create(stream);
while (reader.Read())
{
// Do stuff
}
Where stringFromSQL is the whole string as read from your table.

Resources