Snowflake - OBJECT data type - snowflake-cloud-data-platform

Snowflake OBJECT data type provides support for storing semi structure data (Primarily key value pairs), for example, if below is my dataset (with Parameter being of type OBJECT),
DeviceID, Parameters
D1, { "P1": "100", "P2": "150" }
D2, { "P2": "125", "P3": "200" }
it can flattened out by using, SELECT DeviceID, Parameters['P1'], Parameters['P2'] and the output would be,
DeviceID
P1
P2
D1
100
150
D2
125
200
However if I want to have the individual elements as rows, what is the best method to do this, like if I need the output as below,
DeviceID
ParmeterID
ParameterName
D1
P1
100
D1
P2
150
D2
P1
125
D2
P2
200

Using a CTE for data:
WITH data(DeviceID, Parameters) as (
SELECT column1, parse_json(column2) from values
('D1', '{ "P1": "100", "P2": "150" }'),
('D2', '{ "P2": "125", "P3": "200" }')
)
you want to use the FLATTEN function, wrapped in a TABLE or LATERAL:
SELECT
d.DeviceID,
f.key::text as p1,
f.value::number as P2
FROM data as d,
table(flatten(input=>d.Parameters)) f
gives:
DEVICEID
P1
P2
D1
P1
100
D1
P2
150
D2
P2
125
D2
P3
200

Related

snowflake pivot attribute values into columns in array of objects

EDIT: I gave bad example data. Updated some details and switched out dummy data for sanitized, actual data.
Source system: Freshdesk via Stitch
Table Structure:
create or replace TABLE TICKETS (
CC_EMAILS VARIANT,
COMPANY VARIANT,
COMPANY_ID NUMBER(38,0),
CREATED_AT TIMESTAMP_TZ(9),
CUSTOM_FIELDS VARIANT,
DUE_BY TIMESTAMP_TZ(9),
FR_DUE_BY TIMESTAMP_TZ(9),
FR_ESCALATED BOOLEAN,
FWD_EMAILS VARIANT,
ID NUMBER(38,0) NOT NULL,
IS_ESCALATED BOOLEAN,
PRIORITY FLOAT,
REPLY_CC_EMAILS VARIANT,
REQUESTER VARIANT,
REQUESTER_ID NUMBER(38,0),
RESPONDER_ID NUMBER(38,0),
SOURCE FLOAT,
SPAM BOOLEAN,
STATS VARIANT,
STATUS FLOAT,
SUBJECT VARCHAR(16777216),
TAGS VARIANT,
TICKET_CC_EMAILS VARIANT,
TYPE VARCHAR(16777216),
UPDATED_AT TIMESTAMP_TZ(9),
_SDC_BATCHED_AT TIMESTAMP_TZ(9),
_SDC_EXTRACTED_AT TIMESTAMP_TZ(9),
_SDC_RECEIVED_AT TIMESTAMP_TZ(9),
_SDC_SEQUENCE NUMBER(38,0),
_SDC_TABLE_VERSION NUMBER(38,0),
EMAIL_CONFIG_ID NUMBER(38,0),
TO_EMAILS VARIANT,
PRODUCT_ID NUMBER(38,0),
GROUP_ID NUMBER(38,0),
ASSOCIATION_TYPE NUMBER(38,0),
ASSOCIATED_TICKETS_COUNT NUMBER(38,0),
DELETED BOOLEAN,
primary key (ID)
);
Note the variant field, "custom_fields". It undergoes an unfortunate transformation between the api and snowflake. The resulting field contains an array of 3 or more objects, each one a custom field. I do not have the ability to change the data format. Examples:
# values could be null
[
{
"name": "cf_request",
"value": "none"
},
{
"name": "cf_related_with",
"value": "none"
},
{
"name": "cf_question",
"value": "none"
}
]
# or values could have a combination of null and non-null values
[
{
"name": "cf_request",
"value": "none"
},
{
"name": "cf_related_with",
"value": "none"
},
{
"name": "cf_question",
"value": "concern"
}
]
# or they could all have non-null values
[
{
"name": "cf_request",
"value": "issue with timer"
},
{
"name": "cf_related_with",
"value": "timer stopped"
},
{
"name": "cf_question",
"value": "technical problem"
}
]
I would essentially like to pivot these into fields in a select query where the name attribute's value becomes a column header. Making the output similar to the following:
+----+------------------+-----------------+-------------------+-----------------------------+
| id | cf_request | cf_related_with | cf_question | all_other_fields |
+----+------------------+-----------------+-------------------+-----------------------------+
| 5 | issue with timer | timer stopped | technical problem | more data about this ticket |
| 6 | hq | laptop issues | some value | more data |
| 7 | a thing | about a thing | about something | more data |
+----+------------------+-----------------+-------------------+-----------------------------+
Is there a function that searches the values of array objects and returns objects with qualifying values? Something like:
select
id,
get_object_where(name = 'category', value) as category,
get_object_where(name = 'subcategory', value) as category,
get_object_where(name = 'subsubcategory', value) as category
from my_data_table
Unfortunately, PIVOT requires an aggregate function, I tried using min and max, but only get a return of null values. Something similar to this approach would be great if there is another syntax to do it that doesn't require aggregation.
with arr as (
select
id,
cs.value:name col_name,
cs.value:value col_value
from my_data_table,
lateral flatten(input => custom_fields) cs
)
select
*
from arr
pivot(col_name for col_value in ('category', 'subcategory', 'subsubcategory')
as p (id, category, subcategory, subsubcategory);
It is possible to use the following approach, but it is flawed in that any time a new custom field is added I have to add cases to account for new positions within the array.
select
id,
case
when custom_fields[0]:name = 'cf_request' then custom_fields[0]:value
when custom_fields[1]:name = 'cf_request' then custom_fields[1]:value
when custom_fields[2]:name = 'cf_request' then custom_fields[2]:value
when custom_fields[2]:name = 'cf_request' then custom_fields[3]:value
else null
end cf_request,
case
when custom_fields[0]:name = 'cf_related_with' then custom_fields[0]:value
when custom_fields[1]:name = 'cf_related_with' then custom_fields[1]:value
when custom_fields[2]:name = 'cf_related_with' then custom_fields[2]:value
when custom_fields[2]:name = 'cf_related_with' then custom_fields[3]:value
else null
end cf_related_with,
case
when custom_fields[0]:name = 'cf_question' then custom_fields[0]:value
when custom_fields[1]:name = 'cf_question' then custom_fields[1]:value
when custom_fields[2]:name = 'cf_question' then custom_fields[2]:value
when custom_fields[2]:name = 'cf_question' then custom_fields[3]:value
else null
end cf_question,
created_at
from my_db.my_schema.tickets;
I think you almost had it. You just need to add a max() or min() around your col_name. As you stated, it needs an aggregate function, and something like max() or min() will work here, since it is aggregating on the name/value pairs that you have. If you have 2 subcategory values, for example, it'll pick the min/max value. From your example, that doesn't appear to be an issue, so it'll always choose the value you want. I was able to replicate your scenario with this query:
WITH x AS (
SELECT parse_json('[{"name": "category","value": "Bikes"},{"name": "subcategory","value": "Mountain Bikes"},{"name": "subsubcategory","value": "hardtail bikes"}]')::VARIANT as field_var
),
arr as (
select
seq,
cs.value:name::varchar col_name,
cs.value:value::varchar col_value
from x,
lateral flatten(input => x.field_var) cs
)
select
*
from arr
pivot(max(col_value) for col_name in ('category','subcategory','subsubcategory')) as p (seq, category, subcategory, subsubcategory);

Postgresql select json array into rows and single text

I have query to get result from table like this:
SELECT test_id, content::json->'scenario'
FROM test
And i got these result, with array of objects in the scenario column:
test_id | scenario
29 | [{"name":"OpenSignal", "task":[{"name":"speedtest"}]}, {"name":"ITest", "task":[{"name":"speedtest"}]}, {"name":"EqualOne", "task":[{"name":"flashtest"}, {"name":"web"}, {"name":"video"}]}]
30 | [{"name":"Speedtest", "task":[{"name":"speedtest"}]}, {"name":"ITest", "task":[{"name":"speedtest"}]}, {"name":"EqualOne", "task":[{"name":"flashtest"}, {"name":"web"}, {"name":"video"}]}]
The object structure is like this:
[{
"name": "OpenSignal",
"task": [{
"name": "speedtest"
}]
}, {
"name": "ITest",
"task": [{
"name": "speedtest"
}]
}, {
"name": "EqualOne",
"task": [{
"name": "flashtest"
}, {
"name": "web"
}, {
"name": "video"
}]
}]
How can i get result like these:
test_id | scenario
29 | Opensignal-speedtest
29 | ITest-speedtest
29 | EqualOne-flashtest
29 | EqualOne-web
29 | EqualOne-video
30 | Opensignal-speedtest
30 | ITest-speedtest
30 | EqualOne-flashtest
30 | EqualOne-web
30 | EqualOne-video
And
test_id | scenarios
29 | OpenSignal-speedtest,ITest-speedtest,EqualOne-flashtest, EqualOne-web,EqualOne-video
30 | Speedtest-speedtest,ITest-speedtest,EqualOne-flashtest,EqualOne-web,EqualOne-video
Thanks in advance my brothers
For your first query, you could do something like this:
SELECT test_id, CONCAT(sub.element->'name', '-', json_array_elements(sub.element->'task')->'name') as scenario
FROM
(SELECT test_id, json_array_elements(content::json) as element
FROM test) as sub;
I used a subquery to get the elements from your original json, and then I concatenate the name with each task name with a dash.
Then, to easily get them separated per id, I wrapped it in another subquery using the string_agg function:
SELECT test_id,
string_agg(task, ',')
FROM(
SELECT test_id, CONCAT(sub.element->'name', '-', json_array_elements(sub.element->'task')->'name') as task
FROM
(SELECT test_id, json_array_elements(content::json) as element
FROM test) as sub
)as tasks
GROUP BY test_id
Sorry if it looks a bit messy, here is an sqlfiddle link you can use.
http://sqlfiddle.com/#!17/fcb27/38

Concatenate string values of objects in a postgresql json array

I have a postgresql table with a json column filled with objects nested in array. Now I want to build a query that returns the id and concatenated string values of objects in the json column. PostgreSQL version is 9.5.
Example data
CREATE TABLE test
(
id integer,
data json
);
INSERT INTO test (id, data) VALUES (1, '{
"info":"a",
"items":[
{ "name":"a_1" },
{ "name":"a_2" },
{ "name":"a_3" }
]
}');
INSERT INTO test (id, data) VALUES (2, '{
"info":"b",
"items":[
{ "name":"b_1" },
{ "name":"b_2" },
{ "name":"b_3" }
]
}');
INSERT INTO test (id, data) VALUES (3, '{
"info":"c",
"items":[
{ "name":"c_1" },
{ "name":"c_2" },
{ "name":"c_3" }
]
}');
Not quite working as intended example
So far I've been able to get the values from the table, unfortunately without the strings being added to one another.
SELECT
row.id,
item ->> 'name'
FROM
test as row,
json_array_elements(row.data #> '{items}' ) as item;
Which will output:
id | names
----------
1 | a_1
1 | a_2
1 | a_3
2 | b_1
2 | b_2
2 | b_3
3 | c_1
3 | c_2
3 | c_3
Intended output example
How would a query look like that returns this output?
id | names
----------
1 | a_1, a_2, a_3
2 | b_1, b_2, b_3
3 | c_1, c_2, c_3
SQL Fiddle link
Your original attempt was missing a group by step
This should work:
SELECT
id
, STRING_AGG(item->>'name', ', ')
FROM
test,
json_array_elements(test.data->'items') as item
GROUP BY 1
By changing the SQL for the second column into an array should give the required results ....
SELECT
row.id, ARRAY (SELECT item ->> 'name'
FROM
test as row1,
json_array_elements(row.data #> '{items}' ) as item WHERE row.id=row1.id)
FROM
test as row;

Display JSON array elements as one line in U-SQL

How do I display each JSON array element as a comma separated element in one line, rather than one element per line, in U-SQL?
For example, the JSON file is:
{
"A": {
"A1": "1",
"A2": 0
},
"B": {
"B1": "1",
"B2": 0
},
"C": {
"C1": [
{
"D1": "1"
},
{
"D2": "2"
},
{
"D3": "3"
},
{
"D4": "4"
},
{
"D5": "5"
},
{
"D6": "6"
},
{
"D7": "7"
}
]
}
}
The code to process this fragment for the array C1 is as follows:
#sql = SELECT
Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(C)["C1"] AS C1_array
FROM #json;
OUTPUT #sql TO "test.txt" USING Outputters.Csv(quoting: false);
#sql2 = SELECT
Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(C1_array) AS C1
FROM #sql
CROSS APPLY
EXPLODE (Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(C1_array).Values) AS D(C1);
#result =
SELECT C1["D1"]AS D1,
C1["D2"] AS D2,
C1["D3"]AS D3,
C1["D4"]AS D4,
C1["D5"]AS D5,
C1["D6"]AS D6,
C1["D7"]AS D7,
FROM #sql2;
OUTPUT #result TO "output.txt" USING Outputters.Text();
The result that all the array elements print out as one per line, ie, all the D1 through D7 elements are on separate lines. I want the D1 through D7 elements to be part of the same line, as it is part of the JSON object.
That is:
1, 2, 3, 4, 5, 6, 7
How can this be done?
The important part is that the C1 array contains one item per Di. So if you treat it as an item per row, you will get separate rows. In this case you want one row for all of C1.
The following does this in two ways: One time you know what the Ds are and one time if you do not know and still want them in one row (now all in one cell).
REFERENCE ASSEMBLY JSONBlog.[Newtonsoft.Json];
REFERENCE ASSEMBLY JSONBlog.[Microsoft.Analytics.Samples.Formats];
// Get one row per C and get the C1 array as column
#d = EXTRACT C1 string FROM "/Temp/ABCD.txt" USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor("C");
// Keep one row per C and get all the items from within the C1 array
#d =
SELECT Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple(C1, "[*].*") AS DMap
FROM #d;
// Get individual items
#d1 =
SELECT
DMap["[0].D1"] AS D1,
DMap["[1].D2"] AS D2,
DMap["[2].D3"] AS D3,
DMap["[3].D4"] AS D4,
DMap["[4].D5"] AS D5,
DMap["[5].D6"] AS D6,
DMap["[6].D7"] AS D7
FROM #d;
// Keep it generic and get all item in a single column
#d2 =
SELECT String.Join("\t", DMap.Values) AS Ds
FROM #d;
OUTPUT #d1
TO "/Temp/D-Out1.tsv"
USING Outputters.Tsv();
OUTPUT #d2
TO "/Temp/D-Out2.tsv"
USING Outputters.Tsv(quoting:false);
As you can see, the JsonTuple function can take a JSONPath expression and then it uses all found paths in the resulting map as keys.

Is there a jsonb array overlap function for postgres?

Am not able to extract and compare two arrays from jsonb in postgres to do an overlap check. Is there a working function for this?
Example in people_favorite_color table:
{
"person_id":1,
"favorite_colors":["red","orange","yellow"]
}
{
"person_id":2,
"favorite_colors":["yellow","green","blue"]
}
{
"person_id":3,
"favorite_colors":["black","white"]
}
Array overlap postgres tests:
select
p1.json_data->>'person_id',
p2.json_data->>'person_id',
p1.json_data->'favorite_colors' && p2.json_data->'favorite_colors'
from people_favorite_color p1 join people_favorite_color p2 on (1=1)
where p1.json_data->>'person_id' < p2.json_data->>'person_id'
Expected results:
p1.id;p2.id;likes_same_color
1;2;t
1;3;f
2;3;f
--edit--
Attempting to cast to text[] results in an error:
select
('{
"person_id":3,
"favorite_colors":["black","white"]
}'::jsonb->>'favorite_colors')::text[];
ERROR: malformed array literal: "["black", "white"]"
DETAIL: "[" must introduce explicitly-specified array dimensions.
Use array_agg() and jsonb_array_elements_text() to convert jsonb array to text array:
with the_data as (
select id, array_agg(color) colors
from (
select json_data->'person_id' id, color
from
people_favorite_color,
jsonb_array_elements_text(json_data->'favorite_colors') color
) sub
group by 1
)
select p1.id, p2.id, p1.colors && p2.colors like_same_colors
from the_data p1
join the_data p2 on p1.id < p2.id
order by 1, 2;
id | id | like_same_colors
----+----+------------------
1 | 2 | t
1 | 3 | f
2 | 3 | f
(3 rows)

Resources