Postgres jsonb cast recordset from UNIX to timestamp - arrays

I'm working in a Postgres table that has a jsonb column. I've been able to create a recordset to turn the json to rows from the jsonb object. I'm struggling to convert timestamp from UNIX to readable timestamp.
This is what the jsonb object looks like with timestamp stored as UNIX:
{
"signal": [
{
"id": "e80",
"on": true,
"unit": "sample 1",
"timestamp": 1521505355
},
{
"id": "97d",
"on": false,
"unit": "sample 2",
"timestamp": 1521654433
},
{
"id": "97d",
"on": false,
"unit": "sample 3",
"timestamp": 1521654433
}
]
}
ideally i'd like it to look like this but get an error for the timestamp
id | on | unit | timestamp
---+------+----------+--------------------------
e80|true | sample 1 | 2018-03-20 00:22:35+00:00
97d|false | sample 2 | 2018-03-21 17:47:13+00:00
97d|false | sample 3 | 2018-03-21 17:47:13+00:00
this is what i have so far which returns the expected values for the columns but gives an error for the timestamp column
select b.*
from device d
cross join lateral jsonb_to_recordset(d.events->'signal') as
b("id" integer, "on" boolean, "unit" text, "timestamp" timestamp)
the timestamp datatype is throwing off an error.
[22008] ERROR: date/time field value out of range
Any help or suggestions for casting the timestamp from UNIX to an actual timestamp is greatly appreciated.

You may specify it as INTEGER in column definition list and then Convert it to TIMESTAMP using TO_TIMESTAMP
Furthermore, Theid which you are trying to define can't be integer.
SQL Fiddle
Query 1:
SELECT b.id
,b.ON
,b.unit
,to_timestamp("timestamp") AS "timestamp"
FROM device d
CROSS JOIN lateral jsonb_to_recordset(d.events -> 'signal')
AS b("id" TEXT, "on" boolean, "unit" TEXT, "timestamp" INT)
Results:
| id | on | unit | timestamp |
|-----|-------|----------|----------------------|
| e80 | true | sample 1 | 2018-03-20T00:22:35Z |
| 97d | false | sample 2 | 2018-03-21T17:47:13Z |
| 97d | false | sample 3 | 2018-03-21T17:47:13Z |

Related

How do I create view on table with jsonb array entries in postgresql

I know there are a bunch of similar questions but I still couldn't find one for my explicit problem.
I have a postgres table that looks like this:
|---------------------|----------------------------------------|-------------------------|
| id | data | timestamp |
|---------------------|----------------------------------------|-------------------------|
| 1 | [{"key1": "value1", "key2":value2"} | 2020-06-09 13:15:00 |
| | ,{"key1": "value1", "key2":value2"}] | |
|---------------------|----------------------------------------|-------------------------|
| 2 | [{"key1": "value1", "key2":value2"} | 2020-06-09 13:20:00 |
| | ,{"key1": "value1", "key2":value2"}] |
I want to create a view on this table so that each key value in the jsonb data column gets it's own column.
I played around with the json_array_elements and json_b each function but I could not get it to work after all.
In the best case this View would be generated generically with any amount of keyNames.
Any help is greatly appreciated :)
Edit:
I have following example Structure for my jsonb array:
[{
"key1": "value",
"id": "fac30fe9-a39c-445a-84de-637a199f1dfa",
"subobject1": {
"subkey1": "subvalue1",
"subkey2": "subvalue2"
},
"key3": "value3"
},
key1": "value",
"id": "fac30fe9-a39c-445a-84de-637a199f1dfa",
"subobject1": {
"subkey1": "subvalue1",
"subkey2": "subvalue2"
},
"key3": "value3"
}
]
I think i figured out a solution that i can currently live with:
CREATE VIEW viewX as
SELECT id,
messStellen ->> 'uuid' as uuid
FROM (
WITH
A AS (
SELECT
Id
,jsonb_array_elements(data) AS messStellen
FROM "3"
)
SELECT *
FROM A
) x
However, I would be grateful if anyone could point me into the right direction how i could template such a query to be applicable to a table where the data field could consist of json objects with other keys.

snowflake pivot attribute values into columns in array of objects

EDIT: I gave bad example data. Updated some details and switched out dummy data for sanitized, actual data.
Source system: Freshdesk via Stitch
Table Structure:
create or replace TABLE TICKETS (
CC_EMAILS VARIANT,
COMPANY VARIANT,
COMPANY_ID NUMBER(38,0),
CREATED_AT TIMESTAMP_TZ(9),
CUSTOM_FIELDS VARIANT,
DUE_BY TIMESTAMP_TZ(9),
FR_DUE_BY TIMESTAMP_TZ(9),
FR_ESCALATED BOOLEAN,
FWD_EMAILS VARIANT,
ID NUMBER(38,0) NOT NULL,
IS_ESCALATED BOOLEAN,
PRIORITY FLOAT,
REPLY_CC_EMAILS VARIANT,
REQUESTER VARIANT,
REQUESTER_ID NUMBER(38,0),
RESPONDER_ID NUMBER(38,0),
SOURCE FLOAT,
SPAM BOOLEAN,
STATS VARIANT,
STATUS FLOAT,
SUBJECT VARCHAR(16777216),
TAGS VARIANT,
TICKET_CC_EMAILS VARIANT,
TYPE VARCHAR(16777216),
UPDATED_AT TIMESTAMP_TZ(9),
_SDC_BATCHED_AT TIMESTAMP_TZ(9),
_SDC_EXTRACTED_AT TIMESTAMP_TZ(9),
_SDC_RECEIVED_AT TIMESTAMP_TZ(9),
_SDC_SEQUENCE NUMBER(38,0),
_SDC_TABLE_VERSION NUMBER(38,0),
EMAIL_CONFIG_ID NUMBER(38,0),
TO_EMAILS VARIANT,
PRODUCT_ID NUMBER(38,0),
GROUP_ID NUMBER(38,0),
ASSOCIATION_TYPE NUMBER(38,0),
ASSOCIATED_TICKETS_COUNT NUMBER(38,0),
DELETED BOOLEAN,
primary key (ID)
);
Note the variant field, "custom_fields". It undergoes an unfortunate transformation between the api and snowflake. The resulting field contains an array of 3 or more objects, each one a custom field. I do not have the ability to change the data format. Examples:
# values could be null
[
{
"name": "cf_request",
"value": "none"
},
{
"name": "cf_related_with",
"value": "none"
},
{
"name": "cf_question",
"value": "none"
}
]
# or values could have a combination of null and non-null values
[
{
"name": "cf_request",
"value": "none"
},
{
"name": "cf_related_with",
"value": "none"
},
{
"name": "cf_question",
"value": "concern"
}
]
# or they could all have non-null values
[
{
"name": "cf_request",
"value": "issue with timer"
},
{
"name": "cf_related_with",
"value": "timer stopped"
},
{
"name": "cf_question",
"value": "technical problem"
}
]
I would essentially like to pivot these into fields in a select query where the name attribute's value becomes a column header. Making the output similar to the following:
+----+------------------+-----------------+-------------------+-----------------------------+
| id | cf_request | cf_related_with | cf_question | all_other_fields |
+----+------------------+-----------------+-------------------+-----------------------------+
| 5 | issue with timer | timer stopped | technical problem | more data about this ticket |
| 6 | hq | laptop issues | some value | more data |
| 7 | a thing | about a thing | about something | more data |
+----+------------------+-----------------+-------------------+-----------------------------+
Is there a function that searches the values of array objects and returns objects with qualifying values? Something like:
select
id,
get_object_where(name = 'category', value) as category,
get_object_where(name = 'subcategory', value) as category,
get_object_where(name = 'subsubcategory', value) as category
from my_data_table
Unfortunately, PIVOT requires an aggregate function, I tried using min and max, but only get a return of null values. Something similar to this approach would be great if there is another syntax to do it that doesn't require aggregation.
with arr as (
select
id,
cs.value:name col_name,
cs.value:value col_value
from my_data_table,
lateral flatten(input => custom_fields) cs
)
select
*
from arr
pivot(col_name for col_value in ('category', 'subcategory', 'subsubcategory')
as p (id, category, subcategory, subsubcategory);
It is possible to use the following approach, but it is flawed in that any time a new custom field is added I have to add cases to account for new positions within the array.
select
id,
case
when custom_fields[0]:name = 'cf_request' then custom_fields[0]:value
when custom_fields[1]:name = 'cf_request' then custom_fields[1]:value
when custom_fields[2]:name = 'cf_request' then custom_fields[2]:value
when custom_fields[2]:name = 'cf_request' then custom_fields[3]:value
else null
end cf_request,
case
when custom_fields[0]:name = 'cf_related_with' then custom_fields[0]:value
when custom_fields[1]:name = 'cf_related_with' then custom_fields[1]:value
when custom_fields[2]:name = 'cf_related_with' then custom_fields[2]:value
when custom_fields[2]:name = 'cf_related_with' then custom_fields[3]:value
else null
end cf_related_with,
case
when custom_fields[0]:name = 'cf_question' then custom_fields[0]:value
when custom_fields[1]:name = 'cf_question' then custom_fields[1]:value
when custom_fields[2]:name = 'cf_question' then custom_fields[2]:value
when custom_fields[2]:name = 'cf_question' then custom_fields[3]:value
else null
end cf_question,
created_at
from my_db.my_schema.tickets;
I think you almost had it. You just need to add a max() or min() around your col_name. As you stated, it needs an aggregate function, and something like max() or min() will work here, since it is aggregating on the name/value pairs that you have. If you have 2 subcategory values, for example, it'll pick the min/max value. From your example, that doesn't appear to be an issue, so it'll always choose the value you want. I was able to replicate your scenario with this query:
WITH x AS (
SELECT parse_json('[{"name": "category","value": "Bikes"},{"name": "subcategory","value": "Mountain Bikes"},{"name": "subsubcategory","value": "hardtail bikes"}]')::VARIANT as field_var
),
arr as (
select
seq,
cs.value:name::varchar col_name,
cs.value:value::varchar col_value
from x,
lateral flatten(input => x.field_var) cs
)
select
*
from arr
pivot(max(col_value) for col_name in ('category','subcategory','subsubcategory')) as p (seq, category, subcategory, subsubcategory);

Postgresql select json array into rows and single text

I have query to get result from table like this:
SELECT test_id, content::json->'scenario'
FROM test
And i got these result, with array of objects in the scenario column:
test_id | scenario
29 | [{"name":"OpenSignal", "task":[{"name":"speedtest"}]}, {"name":"ITest", "task":[{"name":"speedtest"}]}, {"name":"EqualOne", "task":[{"name":"flashtest"}, {"name":"web"}, {"name":"video"}]}]
30 | [{"name":"Speedtest", "task":[{"name":"speedtest"}]}, {"name":"ITest", "task":[{"name":"speedtest"}]}, {"name":"EqualOne", "task":[{"name":"flashtest"}, {"name":"web"}, {"name":"video"}]}]
The object structure is like this:
[{
"name": "OpenSignal",
"task": [{
"name": "speedtest"
}]
}, {
"name": "ITest",
"task": [{
"name": "speedtest"
}]
}, {
"name": "EqualOne",
"task": [{
"name": "flashtest"
}, {
"name": "web"
}, {
"name": "video"
}]
}]
How can i get result like these:
test_id | scenario
29 | Opensignal-speedtest
29 | ITest-speedtest
29 | EqualOne-flashtest
29 | EqualOne-web
29 | EqualOne-video
30 | Opensignal-speedtest
30 | ITest-speedtest
30 | EqualOne-flashtest
30 | EqualOne-web
30 | EqualOne-video
And
test_id | scenarios
29 | OpenSignal-speedtest,ITest-speedtest,EqualOne-flashtest, EqualOne-web,EqualOne-video
30 | Speedtest-speedtest,ITest-speedtest,EqualOne-flashtest,EqualOne-web,EqualOne-video
Thanks in advance my brothers
For your first query, you could do something like this:
SELECT test_id, CONCAT(sub.element->'name', '-', json_array_elements(sub.element->'task')->'name') as scenario
FROM
(SELECT test_id, json_array_elements(content::json) as element
FROM test) as sub;
I used a subquery to get the elements from your original json, and then I concatenate the name with each task name with a dash.
Then, to easily get them separated per id, I wrapped it in another subquery using the string_agg function:
SELECT test_id,
string_agg(task, ',')
FROM(
SELECT test_id, CONCAT(sub.element->'name', '-', json_array_elements(sub.element->'task')->'name') as task
FROM
(SELECT test_id, json_array_elements(content::json) as element
FROM test) as sub
)as tasks
GROUP BY test_id
Sorry if it looks a bit messy, here is an sqlfiddle link you can use.
http://sqlfiddle.com/#!17/fcb27/38

Hive lateral view not working AWS Athena

Im working on a process of AWS Cloudtrail log analysis, Im getting stuck in extract JSON from a row,
This is my table definition.
CREATE EXTERNAL TABLE cloudtrail_logs (
eventversion STRING,
eventName STRING,
awsRegion STRING,
requestParameters STRING,
elements STRING ,
additionalEventData STRING
)
ROW FORMAT SERDE 'com.amazon.emr.hive.serde.CloudTrailSerde'
STORED AS INPUTFORMAT 'com.amazon.emr.cloudtrail.CloudTrailInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://XXXXXX/CloudTrail'
If I run select elements from cl1 limit 1 it returns this result.
{"groupId":"sg-XXXX","ipPermissions":{"items":[{"ipProtocol":"tcp","fromPort":22,"toPort":22,"groups":{},"ipRanges":{"items":[{"cidrIp":"0.0.0.0/0"}]},"prefixListIds":{}}]}}
I need to show this result as virtual columns like,
| groupId | ipProtocol | fromPort | toPort| ipRanges.items.cidrIp|
|---------|------------|--------- | ------|-----------------------------|
| -1 | 0 | | | |
Im using AWS Athena and I tried Lateral view and get_json_object is not working in AWS.
its an external table
select json_extract_scalar(i.item,'$.ipProtocol') as ipProtocol
,json_extract_scalar(i.item,'$.fromPort') as fromPort
,json_extract_scalar(i.item,'$.toPort') as toPort
from cloudtrail_logs
cross join unnest (cast(json_extract(elements,'$.ipPermissions.items')
as array(json))) as i (item)
;
ipProtocol | fromPort | toPort
------------+----------+--------
"tcp" | 22 | 22

Returning BigQuery data filtered on nested objects

I'm trying to create a query which returns data which is filtered on 2 nested objects. I've added (1) and (2) to the code to indicate that I want results from two different nested objects (I know that this isn't a valid query). I've been looking at WITHIN RECORD but I can't get my head around it.
SELECT externalIds.value(1) AS appName, externalIds.value(2) AS driverRef, SUM(quantity)/ 60 FROM [billing.tempBilling]
WHERE callTo = 'example' AND externalIds.type(1) = 'driverRef' AND externalIds.type(2) = 'applicationName'
GROUP BY appName, driverRef ORDER BY appName, driverRef;
The data loaded into BigQuery looks like this:
{
"callTo": "example",
"quantity": 120,
"externalIds": [
{"type": "applicationName", "value": "Example App"},
{"type": "driverRef", "value": 234}
]
}
The result I'm after is this:
+-------------+-----------+----------+
| appName | driverRef | quantity |
+-------------+-----------+----------+
| Example App | 123 | 12.3 |
| Example App | 234 | 132.7 |
| Test App | 142 | 14.1 |
| Test App | 234 | 17.4 |
| Test App | 347 | 327.5 |
+-------------+-----------+----------+
If all of the quantities that you need to sum are within the same record, then you can use WITHIN RECORD for this query. Use NTH() WITHIN RECORD to get the first and second values for a field in the record. Then use HAVING to perform the filtering because it requires a value computed by an aggregation function.
SELECT callTo,
NTH(1, externalIds.type) WITHIN RECORD AS firstType,
NTH(1, externalIds.value) WITHIN RECORD AS maybeAppName,
NTH(2, externalIds.type) WITHIN RECORD AS secondType,
NTH(2, externalIds.value) WITHIN RECORD AS maybeDriverRef,
SUM(quantity) WITHIN RECORD
FROM [billing.tempBilling]
HAVING callTo LIKE 'example%' AND
firstType = 'applicationName' AND
secondType = 'driverRef';
If the quantities to be summed are spread across multiple records, then you can start with this approach and then group by your keys and sum those quantities in an outer query.

Resources