I know there are a bunch of similar questions but I still couldn't find one for my explicit problem.
I have a postgres table that looks like this:
|---------------------|----------------------------------------|-------------------------|
| id | data | timestamp |
|---------------------|----------------------------------------|-------------------------|
| 1 | [{"key1": "value1", "key2":value2"} | 2020-06-09 13:15:00 |
| | ,{"key1": "value1", "key2":value2"}] | |
|---------------------|----------------------------------------|-------------------------|
| 2 | [{"key1": "value1", "key2":value2"} | 2020-06-09 13:20:00 |
| | ,{"key1": "value1", "key2":value2"}] |
I want to create a view on this table so that each key value in the jsonb data column gets it's own column.
I played around with the json_array_elements and json_b each function but I could not get it to work after all.
In the best case this View would be generated generically with any amount of keyNames.
Any help is greatly appreciated :)
Edit:
I have following example Structure for my jsonb array:
[{
"key1": "value",
"id": "fac30fe9-a39c-445a-84de-637a199f1dfa",
"subobject1": {
"subkey1": "subvalue1",
"subkey2": "subvalue2"
},
"key3": "value3"
},
key1": "value",
"id": "fac30fe9-a39c-445a-84de-637a199f1dfa",
"subobject1": {
"subkey1": "subvalue1",
"subkey2": "subvalue2"
},
"key3": "value3"
}
]
I think i figured out a solution that i can currently live with:
CREATE VIEW viewX as
SELECT id,
messStellen ->> 'uuid' as uuid
FROM (
WITH
A AS (
SELECT
Id
,jsonb_array_elements(data) AS messStellen
FROM "3"
)
SELECT *
FROM A
) x
However, I would be grateful if anyone could point me into the right direction how i could template such a query to be applicable to a table where the data field could consist of json objects with other keys.
EDIT: I gave bad example data. Updated some details and switched out dummy data for sanitized, actual data.
Source system: Freshdesk via Stitch
Table Structure:
create or replace TABLE TICKETS (
CC_EMAILS VARIANT,
COMPANY VARIANT,
COMPANY_ID NUMBER(38,0),
CREATED_AT TIMESTAMP_TZ(9),
CUSTOM_FIELDS VARIANT,
DUE_BY TIMESTAMP_TZ(9),
FR_DUE_BY TIMESTAMP_TZ(9),
FR_ESCALATED BOOLEAN,
FWD_EMAILS VARIANT,
ID NUMBER(38,0) NOT NULL,
IS_ESCALATED BOOLEAN,
PRIORITY FLOAT,
REPLY_CC_EMAILS VARIANT,
REQUESTER VARIANT,
REQUESTER_ID NUMBER(38,0),
RESPONDER_ID NUMBER(38,0),
SOURCE FLOAT,
SPAM BOOLEAN,
STATS VARIANT,
STATUS FLOAT,
SUBJECT VARCHAR(16777216),
TAGS VARIANT,
TICKET_CC_EMAILS VARIANT,
TYPE VARCHAR(16777216),
UPDATED_AT TIMESTAMP_TZ(9),
_SDC_BATCHED_AT TIMESTAMP_TZ(9),
_SDC_EXTRACTED_AT TIMESTAMP_TZ(9),
_SDC_RECEIVED_AT TIMESTAMP_TZ(9),
_SDC_SEQUENCE NUMBER(38,0),
_SDC_TABLE_VERSION NUMBER(38,0),
EMAIL_CONFIG_ID NUMBER(38,0),
TO_EMAILS VARIANT,
PRODUCT_ID NUMBER(38,0),
GROUP_ID NUMBER(38,0),
ASSOCIATION_TYPE NUMBER(38,0),
ASSOCIATED_TICKETS_COUNT NUMBER(38,0),
DELETED BOOLEAN,
primary key (ID)
);
Note the variant field, "custom_fields". It undergoes an unfortunate transformation between the api and snowflake. The resulting field contains an array of 3 or more objects, each one a custom field. I do not have the ability to change the data format. Examples:
# values could be null
[
{
"name": "cf_request",
"value": "none"
},
{
"name": "cf_related_with",
"value": "none"
},
{
"name": "cf_question",
"value": "none"
}
]
# or values could have a combination of null and non-null values
[
{
"name": "cf_request",
"value": "none"
},
{
"name": "cf_related_with",
"value": "none"
},
{
"name": "cf_question",
"value": "concern"
}
]
# or they could all have non-null values
[
{
"name": "cf_request",
"value": "issue with timer"
},
{
"name": "cf_related_with",
"value": "timer stopped"
},
{
"name": "cf_question",
"value": "technical problem"
}
]
I would essentially like to pivot these into fields in a select query where the name attribute's value becomes a column header. Making the output similar to the following:
+----+------------------+-----------------+-------------------+-----------------------------+
| id | cf_request | cf_related_with | cf_question | all_other_fields |
+----+------------------+-----------------+-------------------+-----------------------------+
| 5 | issue with timer | timer stopped | technical problem | more data about this ticket |
| 6 | hq | laptop issues | some value | more data |
| 7 | a thing | about a thing | about something | more data |
+----+------------------+-----------------+-------------------+-----------------------------+
Is there a function that searches the values of array objects and returns objects with qualifying values? Something like:
select
id,
get_object_where(name = 'category', value) as category,
get_object_where(name = 'subcategory', value) as category,
get_object_where(name = 'subsubcategory', value) as category
from my_data_table
Unfortunately, PIVOT requires an aggregate function, I tried using min and max, but only get a return of null values. Something similar to this approach would be great if there is another syntax to do it that doesn't require aggregation.
with arr as (
select
id,
cs.value:name col_name,
cs.value:value col_value
from my_data_table,
lateral flatten(input => custom_fields) cs
)
select
*
from arr
pivot(col_name for col_value in ('category', 'subcategory', 'subsubcategory')
as p (id, category, subcategory, subsubcategory);
It is possible to use the following approach, but it is flawed in that any time a new custom field is added I have to add cases to account for new positions within the array.
select
id,
case
when custom_fields[0]:name = 'cf_request' then custom_fields[0]:value
when custom_fields[1]:name = 'cf_request' then custom_fields[1]:value
when custom_fields[2]:name = 'cf_request' then custom_fields[2]:value
when custom_fields[2]:name = 'cf_request' then custom_fields[3]:value
else null
end cf_request,
case
when custom_fields[0]:name = 'cf_related_with' then custom_fields[0]:value
when custom_fields[1]:name = 'cf_related_with' then custom_fields[1]:value
when custom_fields[2]:name = 'cf_related_with' then custom_fields[2]:value
when custom_fields[2]:name = 'cf_related_with' then custom_fields[3]:value
else null
end cf_related_with,
case
when custom_fields[0]:name = 'cf_question' then custom_fields[0]:value
when custom_fields[1]:name = 'cf_question' then custom_fields[1]:value
when custom_fields[2]:name = 'cf_question' then custom_fields[2]:value
when custom_fields[2]:name = 'cf_question' then custom_fields[3]:value
else null
end cf_question,
created_at
from my_db.my_schema.tickets;
I think you almost had it. You just need to add a max() or min() around your col_name. As you stated, it needs an aggregate function, and something like max() or min() will work here, since it is aggregating on the name/value pairs that you have. If you have 2 subcategory values, for example, it'll pick the min/max value. From your example, that doesn't appear to be an issue, so it'll always choose the value you want. I was able to replicate your scenario with this query:
WITH x AS (
SELECT parse_json('[{"name": "category","value": "Bikes"},{"name": "subcategory","value": "Mountain Bikes"},{"name": "subsubcategory","value": "hardtail bikes"}]')::VARIANT as field_var
),
arr as (
select
seq,
cs.value:name::varchar col_name,
cs.value:value::varchar col_value
from x,
lateral flatten(input => x.field_var) cs
)
select
*
from arr
pivot(max(col_value) for col_name in ('category','subcategory','subsubcategory')) as p (seq, category, subcategory, subsubcategory);
I'm working in a Postgres table that has a jsonb column. I've been able to create a recordset to turn the json to rows from the jsonb object. I'm struggling to convert timestamp from UNIX to readable timestamp.
This is what the jsonb object looks like with timestamp stored as UNIX:
{
"signal": [
{
"id": "e80",
"on": true,
"unit": "sample 1",
"timestamp": 1521505355
},
{
"id": "97d",
"on": false,
"unit": "sample 2",
"timestamp": 1521654433
},
{
"id": "97d",
"on": false,
"unit": "sample 3",
"timestamp": 1521654433
}
]
}
ideally i'd like it to look like this but get an error for the timestamp
id | on | unit | timestamp
---+------+----------+--------------------------
e80|true | sample 1 | 2018-03-20 00:22:35+00:00
97d|false | sample 2 | 2018-03-21 17:47:13+00:00
97d|false | sample 3 | 2018-03-21 17:47:13+00:00
this is what i have so far which returns the expected values for the columns but gives an error for the timestamp column
select b.*
from device d
cross join lateral jsonb_to_recordset(d.events->'signal') as
b("id" integer, "on" boolean, "unit" text, "timestamp" timestamp)
the timestamp datatype is throwing off an error.
[22008] ERROR: date/time field value out of range
Any help or suggestions for casting the timestamp from UNIX to an actual timestamp is greatly appreciated.
You may specify it as INTEGER in column definition list and then Convert it to TIMESTAMP using TO_TIMESTAMP
Furthermore, Theid which you are trying to define can't be integer.
SQL Fiddle
Query 1:
SELECT b.id
,b.ON
,b.unit
,to_timestamp("timestamp") AS "timestamp"
FROM device d
CROSS JOIN lateral jsonb_to_recordset(d.events -> 'signal')
AS b("id" TEXT, "on" boolean, "unit" TEXT, "timestamp" INT)
Results:
| id | on | unit | timestamp |
|-----|-------|----------|----------------------|
| e80 | true | sample 1 | 2018-03-20T00:22:35Z |
| 97d | false | sample 2 | 2018-03-21T17:47:13Z |
| 97d | false | sample 3 | 2018-03-21T17:47:13Z |
I tried to query my json array using the example here: How do I query using fields inside the new PostgreSQL JSON datatype?
They use the example:
SELECT *
FROM json_array_elements(
'[{"name": "Toby", "occupation": "Software Engineer"},
{"name": "Zaphod", "occupation": "Galactic President"} ]'
) AS elem
WHERE elem->>'name' = 'Toby';
But my Json array looks more like this (if using the example):
{
"people": [{
"name": "Toby",
"occupation": "Software Engineer"
},
{
"name": "Zaphod",
"occupation": "Galactic President"
}
]
}
But I get an error: ERROR: cannot call json_array_elements on a non-array
Is my Json "array" not really an array? I have to use this Json string because it's contained in a database, so I would have to tell them to fix it if it's not an array.
Or, is there another way to query it?
I read documentation but nothing worked, kept getting errors.
The json array has a key people so use my_json->'people' in the function:
with my_table(my_json) as (
values(
'{
"people": [
{
"name": "Toby",
"occupation": "Software Engineer"
},
{
"name": "Zaphod",
"occupation": "Galactic President"
}
]
}'::json)
)
select t.*
from my_table t
cross join json_array_elements(my_json->'people') elem
where elem->>'name' = 'Toby';
The function json_array_elements() unnests the json array and generates all its elements as rows:
select elem->>'name' as name, elem->>'occupation' as occupation
from my_table
cross join json_array_elements(my_json->'people') elem
name | occupation
--------+--------------------
Toby | Software Engineer
Zaphod | Galactic President
(2 rows)
If you are interested in Toby's occupation:
select elem->>'occupation' as occupation
from my_table
cross join json_array_elements(my_json->'people') elem
where elem->>'name' = 'Toby'
occupation
-------------------
Software Engineer
(1 row)
I'm trying to create a query which returns data which is filtered on 2 nested objects. I've added (1) and (2) to the code to indicate that I want results from two different nested objects (I know that this isn't a valid query). I've been looking at WITHIN RECORD but I can't get my head around it.
SELECT externalIds.value(1) AS appName, externalIds.value(2) AS driverRef, SUM(quantity)/ 60 FROM [billing.tempBilling]
WHERE callTo = 'example' AND externalIds.type(1) = 'driverRef' AND externalIds.type(2) = 'applicationName'
GROUP BY appName, driverRef ORDER BY appName, driverRef;
The data loaded into BigQuery looks like this:
{
"callTo": "example",
"quantity": 120,
"externalIds": [
{"type": "applicationName", "value": "Example App"},
{"type": "driverRef", "value": 234}
]
}
The result I'm after is this:
+-------------+-----------+----------+
| appName | driverRef | quantity |
+-------------+-----------+----------+
| Example App | 123 | 12.3 |
| Example App | 234 | 132.7 |
| Test App | 142 | 14.1 |
| Test App | 234 | 17.4 |
| Test App | 347 | 327.5 |
+-------------+-----------+----------+
If all of the quantities that you need to sum are within the same record, then you can use WITHIN RECORD for this query. Use NTH() WITHIN RECORD to get the first and second values for a field in the record. Then use HAVING to perform the filtering because it requires a value computed by an aggregation function.
SELECT callTo,
NTH(1, externalIds.type) WITHIN RECORD AS firstType,
NTH(1, externalIds.value) WITHIN RECORD AS maybeAppName,
NTH(2, externalIds.type) WITHIN RECORD AS secondType,
NTH(2, externalIds.value) WITHIN RECORD AS maybeDriverRef,
SUM(quantity) WITHIN RECORD
FROM [billing.tempBilling]
HAVING callTo LIKE 'example%' AND
firstType = 'applicationName' AND
secondType = 'driverRef';
If the quantities to be summed are spread across multiple records, then you can start with this approach and then group by your keys and sum those quantities in an outer query.