snowflake pivot attribute values into columns in array of objects - snowflake-cloud-data-platform

EDIT: I gave bad example data. Updated some details and switched out dummy data for sanitized, actual data.
Source system: Freshdesk via Stitch
Table Structure:
create or replace TABLE TICKETS (
CC_EMAILS VARIANT,
COMPANY VARIANT,
COMPANY_ID NUMBER(38,0),
CREATED_AT TIMESTAMP_TZ(9),
CUSTOM_FIELDS VARIANT,
DUE_BY TIMESTAMP_TZ(9),
FR_DUE_BY TIMESTAMP_TZ(9),
FR_ESCALATED BOOLEAN,
FWD_EMAILS VARIANT,
ID NUMBER(38,0) NOT NULL,
IS_ESCALATED BOOLEAN,
PRIORITY FLOAT,
REPLY_CC_EMAILS VARIANT,
REQUESTER VARIANT,
REQUESTER_ID NUMBER(38,0),
RESPONDER_ID NUMBER(38,0),
SOURCE FLOAT,
SPAM BOOLEAN,
STATS VARIANT,
STATUS FLOAT,
SUBJECT VARCHAR(16777216),
TAGS VARIANT,
TICKET_CC_EMAILS VARIANT,
TYPE VARCHAR(16777216),
UPDATED_AT TIMESTAMP_TZ(9),
_SDC_BATCHED_AT TIMESTAMP_TZ(9),
_SDC_EXTRACTED_AT TIMESTAMP_TZ(9),
_SDC_RECEIVED_AT TIMESTAMP_TZ(9),
_SDC_SEQUENCE NUMBER(38,0),
_SDC_TABLE_VERSION NUMBER(38,0),
EMAIL_CONFIG_ID NUMBER(38,0),
TO_EMAILS VARIANT,
PRODUCT_ID NUMBER(38,0),
GROUP_ID NUMBER(38,0),
ASSOCIATION_TYPE NUMBER(38,0),
ASSOCIATED_TICKETS_COUNT NUMBER(38,0),
DELETED BOOLEAN,
primary key (ID)
);
Note the variant field, "custom_fields". It undergoes an unfortunate transformation between the api and snowflake. The resulting field contains an array of 3 or more objects, each one a custom field. I do not have the ability to change the data format. Examples:
# values could be null
[
{
"name": "cf_request",
"value": "none"
},
{
"name": "cf_related_with",
"value": "none"
},
{
"name": "cf_question",
"value": "none"
}
]
# or values could have a combination of null and non-null values
[
{
"name": "cf_request",
"value": "none"
},
{
"name": "cf_related_with",
"value": "none"
},
{
"name": "cf_question",
"value": "concern"
}
]
# or they could all have non-null values
[
{
"name": "cf_request",
"value": "issue with timer"
},
{
"name": "cf_related_with",
"value": "timer stopped"
},
{
"name": "cf_question",
"value": "technical problem"
}
]
I would essentially like to pivot these into fields in a select query where the name attribute's value becomes a column header. Making the output similar to the following:
+----+------------------+-----------------+-------------------+-----------------------------+
| id | cf_request | cf_related_with | cf_question | all_other_fields |
+----+------------------+-----------------+-------------------+-----------------------------+
| 5 | issue with timer | timer stopped | technical problem | more data about this ticket |
| 6 | hq | laptop issues | some value | more data |
| 7 | a thing | about a thing | about something | more data |
+----+------------------+-----------------+-------------------+-----------------------------+
Is there a function that searches the values of array objects and returns objects with qualifying values? Something like:
select
id,
get_object_where(name = 'category', value) as category,
get_object_where(name = 'subcategory', value) as category,
get_object_where(name = 'subsubcategory', value) as category
from my_data_table
Unfortunately, PIVOT requires an aggregate function, I tried using min and max, but only get a return of null values. Something similar to this approach would be great if there is another syntax to do it that doesn't require aggregation.
with arr as (
select
id,
cs.value:name col_name,
cs.value:value col_value
from my_data_table,
lateral flatten(input => custom_fields) cs
)
select
*
from arr
pivot(col_name for col_value in ('category', 'subcategory', 'subsubcategory')
as p (id, category, subcategory, subsubcategory);
It is possible to use the following approach, but it is flawed in that any time a new custom field is added I have to add cases to account for new positions within the array.
select
id,
case
when custom_fields[0]:name = 'cf_request' then custom_fields[0]:value
when custom_fields[1]:name = 'cf_request' then custom_fields[1]:value
when custom_fields[2]:name = 'cf_request' then custom_fields[2]:value
when custom_fields[2]:name = 'cf_request' then custom_fields[3]:value
else null
end cf_request,
case
when custom_fields[0]:name = 'cf_related_with' then custom_fields[0]:value
when custom_fields[1]:name = 'cf_related_with' then custom_fields[1]:value
when custom_fields[2]:name = 'cf_related_with' then custom_fields[2]:value
when custom_fields[2]:name = 'cf_related_with' then custom_fields[3]:value
else null
end cf_related_with,
case
when custom_fields[0]:name = 'cf_question' then custom_fields[0]:value
when custom_fields[1]:name = 'cf_question' then custom_fields[1]:value
when custom_fields[2]:name = 'cf_question' then custom_fields[2]:value
when custom_fields[2]:name = 'cf_question' then custom_fields[3]:value
else null
end cf_question,
created_at
from my_db.my_schema.tickets;

I think you almost had it. You just need to add a max() or min() around your col_name. As you stated, it needs an aggregate function, and something like max() or min() will work here, since it is aggregating on the name/value pairs that you have. If you have 2 subcategory values, for example, it'll pick the min/max value. From your example, that doesn't appear to be an issue, so it'll always choose the value you want. I was able to replicate your scenario with this query:
WITH x AS (
SELECT parse_json('[{"name": "category","value": "Bikes"},{"name": "subcategory","value": "Mountain Bikes"},{"name": "subsubcategory","value": "hardtail bikes"}]')::VARIANT as field_var
),
arr as (
select
seq,
cs.value:name::varchar col_name,
cs.value:value::varchar col_value
from x,
lateral flatten(input => x.field_var) cs
)
select
*
from arr
pivot(max(col_value) for col_name in ('category','subcategory','subsubcategory')) as p (seq, category, subcategory, subsubcategory);

Related

Compare multiple date fields in JSON and use them in where clause

So i have a text field in my Postgres 10.8 (json_array_elements not possible) DB. It has a json structure like this.
{
"code_cd": "02",
"tax_cd": null,
"earliest_exit_date": [
{
"date": "2023-03-31",
"_destroy": ""
},
{
"date": "2021-11-01",
"_destroy": ""
},
{
"date": "2021-12-21",
"_destroy": ""
}
],
"enter_date": null,
"leave_date": null
}
earliest exit_date can also be empty like this:
{
"code_cd": "02",
"tax_cd": null,
"earliest_exit_date":[],
"enter_date": null,
"leave_date": null
}
Now i want to get the earliest_exit_date back where the date is after current_date and is the closest one to current_date. From the example with earliest_exit_date the output have to be: 2021-12-21
Anyone knows how to do this?
If your table has unique value or has id you can use below query:
Sample table and data structure: dbfiddle
select distinct
id,
min("date") filter (where "date" > current_date) over (partition by id)
from
test t
cross join jsonb_to_recordset(t.data::jsonb -> 'earliest_exit_date') as e("date" date)
order by id

Postgresql update jsonb keys recursively

Having the following datamodel:
create table test
(
id int primary key,
js jsonb
);
insert into test values (1, '{"id": "total", "price": 400, "breakdown": [{"id": "product1", "price": 400}] }');
insert into test values (2, '{"id": "total", "price": 1000, "breakdown": [{"id": "product1", "price": 400}, {"id": "product2", "price": 600}]}');
I need to update all the price keys to a new name cost.
It is easy to do that on the static field, using:
update test
set js = jsonb_set(js #- '{price}', '{cost}', js #> '{price}');
result:
1 {"id": "total", "cost": 1000, "breakdown": [{"id": "product1", "price": 400}]}
2 {"id": "total", "cost": 2000, "breakdown": [{"id": "product1", "price": 400}, {"id": "product2", "price": 600}]}
But I also need to do this inside the breakdown array.
How can I do this without knowing the number of items in the breakdown array?
In other words, how can I apply a function in place on every element from a jsonb array.
Thank you!
SOLUTION 1 : clean but heavy
First you create an aggregate function simlilar to jsonb_set :
CREATE OR REPLACE FUNCTION jsonb_set(x jsonb, y jsonb, _path text[], _key text, _val jsonb, create_missing boolean DEFAULT True)
RETURNS jsonb LANGUAGE sql IMMUTABLE AS
$$
SELECT jsonb_set(COALESCE(x, y), COALESCE(_path, '{}' :: text[]) || _key, COALESCE(_val, 'null' :: jsonb), create_missing) ;
$$ ;
DROP AGGREGATE IF EXISTS jsonb_set_agg (jsonb, text[], text, jsonb, boolean) CASCADE ;
CREATE AGGREGATE jsonb_set_agg (jsonb, text[], text, jsonb, boolean)
(
sfunc = jsonb_set
, stype = jsonb
) ;
Then, you call the aggregate function while iterating on the jsonb array elements :
WITH list AS (
SELECT id, jsonb_set_agg(js #- '{breakdown,' || ind || ',price}', '{breakdown,' || ind || ',cost}', js #> '{breakdown,' || ind || ',price}', true) AS js
FROM test
CROSS JOIN LATERAL generate_series(0, jsonb_array_length(js->'{breakdown}') - 1) AS ind
GROUP BY id)
UPDATE test AS t
SET js = jsonb_set(l.js #- '{price}', '{cost}', l.js #> '{price}')
FROM list AS l
WHERE t.id = l.id ;
SOLUTION 2 : quick and dirty
You simply convert jsonb to string and replace the substring 'price' by 'cost' :
UPDATE test
SET js = replace(js :: text, 'price', 'cost') :: jsonb
In the general case, this solution will replace the substring 'price' even in the jsonb string values and in the jsonb keys which include the substring 'price'. In order to reduce the risk, you can replace the substring '"price" :' by '"cost" :' but the risk still exists.
This query is sample and easy for change field:
You can see my query structure in: dbfiddle
update test u_t
set js = tmp.new_js
from (
select t.id,
(t.js || jsonb_build_object('cost', t.js ->> 'price')) - 'price'
||
jsonb_build_object('breakdown', jsonb_agg(
(b.value || jsonb_build_object('cost', b.value ->> 'price')) - 'price')) as new_js
from test t
cross join jsonb_array_elements(t.js -> 'breakdown') b
group by t.id) tmp
where u_t.id = tmp.id;
Another way to replace jsonb key in all jsonb objets into a jsonb array:
My query disaggregate the jsonb array. For each object, if price key exist, remove the price key from jsonb object, add the new cost key with the old price's value, then create a new jsonb array with the modified objects. Finally replace the old jsonb array with the new one.
WITH cte AS (SELECT id, jsonb_agg(CASE WHEN item ? 'price'
THEN jsonb_set(item - 'price', '{"cost"}', item -> 'price')
ELSE item END) AS cost_array
FROM test
CROSS JOIN jsonb_array_elements(js -> 'breakdown') WITH ORDINALITY arr(item, index)
GROUP BY id)
UPDATE test
SET js = jsonb_set(js, '{breakdown}', cte.cost_array, false)
FROM cte
WHERE cte.id = test.id;

Postgres jsonb cast recordset from UNIX to timestamp

I'm working in a Postgres table that has a jsonb column. I've been able to create a recordset to turn the json to rows from the jsonb object. I'm struggling to convert timestamp from UNIX to readable timestamp.
This is what the jsonb object looks like with timestamp stored as UNIX:
{
"signal": [
{
"id": "e80",
"on": true,
"unit": "sample 1",
"timestamp": 1521505355
},
{
"id": "97d",
"on": false,
"unit": "sample 2",
"timestamp": 1521654433
},
{
"id": "97d",
"on": false,
"unit": "sample 3",
"timestamp": 1521654433
}
]
}
ideally i'd like it to look like this but get an error for the timestamp
id | on | unit | timestamp
---+------+----------+--------------------------
e80|true | sample 1 | 2018-03-20 00:22:35+00:00
97d|false | sample 2 | 2018-03-21 17:47:13+00:00
97d|false | sample 3 | 2018-03-21 17:47:13+00:00
this is what i have so far which returns the expected values for the columns but gives an error for the timestamp column
select b.*
from device d
cross join lateral jsonb_to_recordset(d.events->'signal') as
b("id" integer, "on" boolean, "unit" text, "timestamp" timestamp)
the timestamp datatype is throwing off an error.
[22008] ERROR: date/time field value out of range
Any help or suggestions for casting the timestamp from UNIX to an actual timestamp is greatly appreciated.
You may specify it as INTEGER in column definition list and then Convert it to TIMESTAMP using TO_TIMESTAMP
Furthermore, Theid which you are trying to define can't be integer.
SQL Fiddle
Query 1:
SELECT b.id
,b.ON
,b.unit
,to_timestamp("timestamp") AS "timestamp"
FROM device d
CROSS JOIN lateral jsonb_to_recordset(d.events -> 'signal')
AS b("id" TEXT, "on" boolean, "unit" TEXT, "timestamp" INT)
Results:
| id | on | unit | timestamp |
|-----|-------|----------|----------------------|
| e80 | true | sample 1 | 2018-03-20T00:22:35Z |
| 97d | false | sample 2 | 2018-03-21T17:47:13Z |
| 97d | false | sample 3 | 2018-03-21T17:47:13Z |

Concatenate string values of objects in a postgresql json array

I have a postgresql table with a json column filled with objects nested in array. Now I want to build a query that returns the id and concatenated string values of objects in the json column. PostgreSQL version is 9.5.
Example data
CREATE TABLE test
(
id integer,
data json
);
INSERT INTO test (id, data) VALUES (1, '{
"info":"a",
"items":[
{ "name":"a_1" },
{ "name":"a_2" },
{ "name":"a_3" }
]
}');
INSERT INTO test (id, data) VALUES (2, '{
"info":"b",
"items":[
{ "name":"b_1" },
{ "name":"b_2" },
{ "name":"b_3" }
]
}');
INSERT INTO test (id, data) VALUES (3, '{
"info":"c",
"items":[
{ "name":"c_1" },
{ "name":"c_2" },
{ "name":"c_3" }
]
}');
Not quite working as intended example
So far I've been able to get the values from the table, unfortunately without the strings being added to one another.
SELECT
row.id,
item ->> 'name'
FROM
test as row,
json_array_elements(row.data #> '{items}' ) as item;
Which will output:
id | names
----------
1 | a_1
1 | a_2
1 | a_3
2 | b_1
2 | b_2
2 | b_3
3 | c_1
3 | c_2
3 | c_3
Intended output example
How would a query look like that returns this output?
id | names
----------
1 | a_1, a_2, a_3
2 | b_1, b_2, b_3
3 | c_1, c_2, c_3
SQL Fiddle link
Your original attempt was missing a group by step
This should work:
SELECT
id
, STRING_AGG(item->>'name', ', ')
FROM
test,
json_array_elements(test.data->'items') as item
GROUP BY 1
By changing the SQL for the second column into an array should give the required results ....
SELECT
row.id, ARRAY (SELECT item ->> 'name'
FROM
test as row1,
json_array_elements(row.data #> '{items}' ) as item WHERE row.id=row1.id)
FROM
test as row;

How to delete array element in JSONB column based on nested key value?

How can I remove an object from an array, based on the value of one of the object's keys?
The array is nested within a parent object.
Here's a sample structure:
{
"foo1": [ { "bar1": 123, "bar2": 456 }, { "bar1": 789, "bar2": 42 } ],
"foo2": [ "some other stuff" ]
}
Can I remove an array element based on the value of bar1?
I can query based on the bar1 value using: columnname #> '{ "foo1": [ { "bar1": 123 } ]}', but I've had no luck finding a way to remove { "bar1": 123, "bar2": 456 } from foo1 while keeping everything else intact.
Thanks
Running PostgreSQL 9.6
Assuming that you want to search for a specific object with an inner object of a certain value, and that this specific object can appear anywhere in the array, you need to unpack the document and each of the arrays, test the inner sub-documents for containment and delete as appropriate, then re-assemble the array and the JSON document (untested):
SELECT id, jsonb_build_object(key, jarray)
FROM (
SELECT foo.id, foo.key, jsonb_build_array(bar.value) AS jarray
FROM ( SELECT id, key, value
FROM my_table, jsonb_each(jdoc) ) foo,
jsonb_array_elements(foo.value) AS bar (value)
WHERE NOT bar.value #> '{"bar1": 123}'::jsonb
GROUP BY 1, 2 ) x
GROUP BY 1;
Now, this may seem a little dense, so picked apart you get:
SELECT id, key, value
FROM my_table, jsonb_each(jdoc)
This uses a lateral join on your table to take the JSON document jdoc and turn it into a set of rows foo(id, key, value) where the value contains the array. The id is the primary key of your table.
Then we get:
SELECT foo.id, foo.key, jsonb_build_array(bar.value) AS jarray
FROM foo, -- abbreviated from above
jsonb_array_elements(foo.value) AS bar (value)
WHERE NOT bar.value #> '{"bar1": 123}'::jsonb
GROUP BY 1, 2
This uses another lateral join to unpack the arrays into bar(value) rows. These objects can now be searched with the containment operator to remove the objects from the result set: WHERE NOT bar.value #> '{"bar1": 123}'::jsonb. In the select list the arrays are re-assembled by id and key but now without the offending sub-documents.
Finally, in the main query the JSON documents are re-assembled:
SELECT id, jsonb_build_object(key, jarray)
FROM x -- from above
GROUP BY 1;
The important thing to understand is that PostgreSQL JSON functions only operate on the level of the JSON document that you can explicitly indicate. Usually that is the top level of the document, unless you have an explicit path to some level in the document (like {foo1, 0, bar1}, but you don't have that). At that level of operation you can then unpack to do your processing such as removing objects.

Resources