A table has a ports column (defined as VARCHAR) which has the following data:
[{u'position': 1, u'macAddress': u'00:8C:FA:C1:7C:88'}, {u'position':
2, u'macAddress': u'00:8C:FA:5E:98:81'}]
I want to extract the data from just the macAddress fields into separate rows. I tried to flatten the data in Snowflake but it is not working as the column is not defined as VARIANT and the the fields have a 'u' in front of them (this is my guess).
00:8C:FA:C3:7C:84
00:5C:FA:7E:98:87
Could someone please help with the requirement.
The provided JSON is not a valid JSON but it is possible to treat it as one with text operations and PARSE_JSON:
SELECT s.value:macAddress::TEXT AS macAddress
FROM t
,LATERAL FLATTEN(INPUT => PARSE_JSON(REPLACE(REPLACE(col, 'u''', ''''), '''', '"')))
AS s;
For input:
CREATE OR REPLACE TABLE t(col TEXT)
AS
SELECT $$[{u'position': 1, u'macAddress': u'00:8C:FA:C1:7C:88'}, {u'position': 2, u'macAddress': u'00:8C:FA:5E:98:81'}]$$;
Output:
I want to split strings into columns.
My columns should be:
account_id, resource_type, resource_name
I have a JSON file source that I have been trying to parse via ADF data flow. That hasn't worked for me, hence I flattened the data and brought it into SQL Server (I am open to parsing values via ADF or SQL if anyone can show me how). Please check the JSON file at the bottom.
Use this code to query the data I am working with.
CREATE TABLE test.test2
(
resource_type nvarchar(max) NULL
)
INSERT INTO test.test2 ([resource_type])
VALUES
('account_id:224526257458,resource_type:buckets,resource_name:camp-stage-artifactory'),
('account_id:535533456241,resource_type:buckets,resource_name:tni-prod-diva-backups'),
('account_id:369798452057,resource_type:buckets,resource_name:369798452057-s3-manifests'),
('account_id:460085747812,resource_type:buckets,resource_name:vessel-incident-report-nonprod-accesslogs')
The output that I should be able to query in SQL Server should like this:
account_id
resource_type
resource_name
224526257458
buckets
camp-stage-artifactory
535533456241
buckets
tni-prod-diva-backups
and so forth.
Please help me out and ask for clarification if needed. Thanks in advance.
EDIT:
Source JSON Format:
{
"start_date": "2021-12-01 00:00:00+00:00",
"end_date": "2021-12-31 23:59:59+00:00",
"resource_type": "all",
"records": [
{
"directconnect_connections": [
"account_id:227148359287,resource_type:directconnect_connections,resource_name:'dxcon-fh40evn5'",
"account_id:401311080156,resource_type:directconnect_connections,resource_name:'dxcon-ffxgf6kh'",
"account_id:401311080156,resource_type:directconnect_connections,resource_name:'dxcon-fg5j5v6o'",
"account_id:227148359287,resource_type:directconnect_connections,resource_name:'dxcon-fgvfo1ej'"
]
},
{
"virtual_interfaces": [
"account_id:227148359287,resource_type:virtual_interfaces,resource_name:'dxvif-fgvj25vt'",
"account_id:227148359287,resource_type:virtual_interfaces,resource_name:'dxvif-fgbw5gs0'",
"account_id:401311080156,resource_type:virtual_interfaces,resource_name:'dxvif-ffnosohr'",
"account_id:227148359287,resource_type:virtual_interfaces,resource_name:'dxvif-fg18bdhl'",
"account_id:227148359287,resource_type:virtual_interfaces,resource_name:'dxvif-ffmf6h64'",
"account_id:390251991779,resource_type:virtual_interfaces,resource_name:'dxvif-fgkxjhcj'",
"account_id:227148359287,resource_type:virtual_interfaces,resource_name:'dxvif-ffp6kl3f'"
]
}
]
}
Since you don't have a valid JSON string and not wanting to get in the business of string manipulation... perhaps this will help.
Select B.*
From test2 A
Cross Apply ( Select account_id = max(case when value like 'account_id:%' then stuff(value,1,11,'') end )
,resource_type = max(case when value like 'resource_type:%' then stuff(value,1,14,'') end )
,resource_name = max(case when value like 'resource_name:%' then stuff(value,1,14,'') end )
from string_split(resource_type,',')
)B
Results
account_id resource_type resource_name
224526257458 buckets camp-stage-artifactory
535533456241 buckets tni-prod-diva-backups
369798452057 buckets 369798452057-s3-manifests
460085747812 buckets vessel-incident-report-nonprod-accesslogs
Unfortunately, the values inside the arrays are not valid JSON. You can patch them up by adding {} to the beginning/end, and adding " on either side of : and ,.
DECLARE #json nvarchar(max) = N'{
"start_date": "2021-12-01 00:00:00+00:00",
"end_date": "2021-12-31 23:59:59+00:00",
"resource_type": "all",
"records": [
{
"directconnect_connections": [
"account_id:227148359287,resource_type:directconnect_connections,resource_name:''dxcon-fh40evn5''",
"account_id:401311080156,resource_type:directconnect_connections,resource_name:''dxcon-ffxgf6kh''",
"account_id:401311080156,resource_type:directconnect_connections,resource_name:''dxcon-fg5j5v6o''",
"account_id:227148359287,resource_type:directconnect_connections,resource_name:''dxcon-fgvfo1ej''"
]
},
{
"virtual_interfaces": [
"account_id:227148359287,resource_type:virtual_interfaces,resource_name:''dxvif-fgvj25vt''",
"account_id:227148359287,resource_type:virtual_interfaces,resource_name:''dxvif-fgbw5gs0''",
"account_id:401311080156,resource_type:virtual_interfaces,resource_name:''dxvif-ffnosohr''",
"account_id:227148359287,resource_type:virtual_interfaces,resource_name:''dxvif-fg18bdhl''",
"account_id:227148359287,resource_type:virtual_interfaces,resource_name:''dxvif-ffmf6h64''",
"account_id:390251991779,resource_type:virtual_interfaces,resource_name:''dxvif-fgkxjhcj''",
"account_id:227148359287,resource_type:virtual_interfaces,resource_name:''dxvif-ffp6kl3f''"
]
}
]
}';
SELECT
j4.account_id,
j4.resource_type,
TRIM('''' FROM j4.resource_name) resource_name
FROM OPENJSON(#json, '$.records') j1
CROSS APPLY OPENJSON(j1.value) j2
CROSS APPLY OPENJSON(j2.value) j3
CROSS APPLY OPENJSON('{"' + REPLACE(REPLACE(j3.value, ':', '":"'), ',', '","') + '"}')
WITH (
account_id bigint,
resource_type varchar(20),
resource_name varchar(100)
) j4;
db<>fiddle
The first three calls to OPENJSON have no schema, so the resultset is three columns: key value and type. In the case of arrays (j1 and j3), key is the index into the array. In the case of single objects (j2), key is each property name.
I have a VARIANT column that contains a JSON response from a web service. It contains a nested array with a float value that I would like to aggregate and return as an average. Here is an example SnowSQL command that I am using:
select
value:disambiguated.id,
value:mentions
from TABLE(
FLATTEN(input =>
PARSE_JSON('{ "entities": [{"count": 2,"disambiguated": {"id": 123},"label": "Coronavirus Disease 2019","mentions": [{"confidence": 0.5928,}, {"confidence": 0.5445,}],"type": "MEDICAL"}]}'):entities
)
)
Which returns:
VALUE:DISAMBIGUATED.ID VALUE:MENTIONS
123 [ { "confidence": 0.5928 }, { "confidence": 0.5445 } ]
What I would like to return is something with the two "confidence" values averaged to 0.56825. I was able to add a second FLATTEN statement which isolated the "mentions" array and allowed me to extract each "confidence" value. I can not seem to figure out how to group the records to calculate the average. Would love to use the built in AVG() function if possible. Thank you in advance for any help you can provide.
Using your example, you can use LATERAL FLATTEN to create your required flattened fields, and then aggregate as you normally would. In this example, I'm grouping on the ID that is in the data, but you could also use y.index or z.index depending on which of those you wanted to group on for your AVG().
WITH x AS (
SELECT PARSE_JSON('{ "entities": [{"count": 2,"disambiguated": {"id": 123},"label": "Coronavirus Disease 2019","mentions": [{"confidence": 0.5928,}, {"confidence": 0.5445,}],"type": "MEDICAL"}]}') as json_str
)
SELECT
y.value:disambiguated.id as id,
avg(z.value:confidence)
from x,
LATERAL FLATTEN(input => json_str:entities) y,
LATERAL FLATTEN(input => y.value:mentions) z
GROUP BY id
;
I have a table with a JSON text field:
create table breaches(breach_id int, detail text);
insert into breaches values
( 1,'[{"breachedState": null},
{"breachedState": "PROCESS_APPLICATION",}]')
I'm trying to use MSSQL's in build JSON parsing functions to test whether ANY object in a JSON array has a matching member value.
If the detail field was a single JSON object, I could use:
select * from breaches
where JSON_VALUE(detail,'$.breachedState') = 'PROCESS_APPLICATION'
but it's an Array, and I want to know if ANY Object has breachedState = 'PROCESS_APPLICATION'
Is this possible using MSSQL's JSON functions?
You can use function OPENJSON to check each object, try this query:
select * from breaches
where exists
(
select *
from
OPENJSON (detail) d
where JSON_VALUE(value,'$.breachedState') = 'PROCESS_APPLICATION'
)
Btw, there is an extra "," in your insert query, it should be:
insert into breaches values
( 1,'[{"breachedState": null},
{"breachedState": "PROCESS_APPLICATION"}]')
How can I remove an object from an array, based on the value of one of the object's keys?
The array is nested within a parent object.
Here's a sample structure:
{
"foo1": [ { "bar1": 123, "bar2": 456 }, { "bar1": 789, "bar2": 42 } ],
"foo2": [ "some other stuff" ]
}
Can I remove an array element based on the value of bar1?
I can query based on the bar1 value using: columnname #> '{ "foo1": [ { "bar1": 123 } ]}', but I've had no luck finding a way to remove { "bar1": 123, "bar2": 456 } from foo1 while keeping everything else intact.
Thanks
Running PostgreSQL 9.6
Assuming that you want to search for a specific object with an inner object of a certain value, and that this specific object can appear anywhere in the array, you need to unpack the document and each of the arrays, test the inner sub-documents for containment and delete as appropriate, then re-assemble the array and the JSON document (untested):
SELECT id, jsonb_build_object(key, jarray)
FROM (
SELECT foo.id, foo.key, jsonb_build_array(bar.value) AS jarray
FROM ( SELECT id, key, value
FROM my_table, jsonb_each(jdoc) ) foo,
jsonb_array_elements(foo.value) AS bar (value)
WHERE NOT bar.value #> '{"bar1": 123}'::jsonb
GROUP BY 1, 2 ) x
GROUP BY 1;
Now, this may seem a little dense, so picked apart you get:
SELECT id, key, value
FROM my_table, jsonb_each(jdoc)
This uses a lateral join on your table to take the JSON document jdoc and turn it into a set of rows foo(id, key, value) where the value contains the array. The id is the primary key of your table.
Then we get:
SELECT foo.id, foo.key, jsonb_build_array(bar.value) AS jarray
FROM foo, -- abbreviated from above
jsonb_array_elements(foo.value) AS bar (value)
WHERE NOT bar.value #> '{"bar1": 123}'::jsonb
GROUP BY 1, 2
This uses another lateral join to unpack the arrays into bar(value) rows. These objects can now be searched with the containment operator to remove the objects from the result set: WHERE NOT bar.value #> '{"bar1": 123}'::jsonb. In the select list the arrays are re-assembled by id and key but now without the offending sub-documents.
Finally, in the main query the JSON documents are re-assembled:
SELECT id, jsonb_build_object(key, jarray)
FROM x -- from above
GROUP BY 1;
The important thing to understand is that PostgreSQL JSON functions only operate on the level of the JSON document that you can explicitly indicate. Usually that is the top level of the document, unless you have an explicit path to some level in the document (like {foo1, 0, bar1}, but you don't have that). At that level of operation you can then unpack to do your processing such as removing objects.