how to use state_window in TDengine query - tdengine

From official document, there is a state_window in query grammar like below:
SELECT select_expr [, select_expr ...]
FROM {tb_name_list}
[WHERE where_condition]
[SESSION(ts_col, tol_val)]
[STATE_WINDOW(col)]
[INTERVAL(interval_val [, interval_offset]) [SLIDING sliding_val]]
[FILL(fill_mod_and_val)]
[GROUP BY col_list]
[ORDER BY col_list { DESC | ASC }]
[SLIMIT limit_val [SOFFSET offset_val]]
[LIMIT limit_val [OFFSET offset_val]]
[>> export_file];
what is state_window and how to use it?

There is a directions of state_window in taos website, you can refer:
https://www.taosdata.com/docs/cn/v2.0/taos-sql#aggregation

Related

Processing results from plpgsql function

I have PostgreSQL function witch returns this json array as text:
[{"seq_no":5796,"start_date":null,"end_date":"2008-09-30 12:32:28","geom_change":"Y"},
{"seq_no":8235,"start_date":"2008-09-30 12:32:28","end_date":"2008-10-02 16:43:24","geom_change":"N"},
{"seq_no":9306,"start_date":"2008-10-02 16:43:24","end_date":"2008-10-02 18:31:09","geom_change":"N"},
{"seq_no":9754,"start_date":"2008-10-02 18:31:09","end_date":"2008-10-07 17:08:25","geom_change":"N"},
{"seq_no":10701,"start_date":"2008-10-07 17:08:25","end_date":"2008-10-08 15:17:48","geom_change":"N"},
{"seq_no":8940,"start_date":"2008-10-08 15:17:48","end_date":"2008-10-08 15:51:47","geom_change":"N"},
{"seq_no":12500,"start_date":"2008-10-08 15:51:47","end_date":"2008-10-08 17:34:35","geom_change":"N"},
{"seq_no":13079,"start_date":"2008-10-08 17:34:35","end_date":"2008-10-08 17:56:03","geom_change":"N"}]
I want to use this data to select seq_no filtered by start/end dates. A preferred result would be table like this.
seq_no start_date end_date geom_change
------------------------------------------------------------------
5796 NULL 2008-09-30 12:32:2 Y
8235 2008-09-30 12:32:28 2008-10-02 16:43:24 N
Or maybe there is simpler way to use this data to select seq_no between start_date and end_date?
You can cast your result as jsonb and then use the jsonb functions and operators along with a tzrange as follows:
with indata (jdata) as (
values (
'[{"seq_no":5796,"start_date":null,"end_date":"2008-09-30 12:32:28","geom_change":"Y"},
{"seq_no":8235,"start_date":"2008-09-30 12:32:28","end_date":"2008-10-02 16:43:24","geom_change":"N"},
{"seq_no":9306,"start_date":"2008-10-02 16:43:24","end_date":"2008-10-02 18:31:09","geom_change":"N"},
{"seq_no":9754,"start_date":"2008-10-02 18:31:09","end_date":"2008-10-07 17:08:25","geom_change":"N"},
{"seq_no":10701,"start_date":"2008-10-07 17:08:25","end_date":"2008-10-08 15:17:48","geom_change":"N"},
{"seq_no":8940,"start_date":"2008-10-08 15:17:48","end_date":"2008-10-08 15:51:47","geom_change":"N"},
{"seq_no":12500,"start_date":"2008-10-08 15:51:47","end_date":"2008-10-08 17:34:35","geom_change":"N"},
{"seq_no":13079,"start_date":"2008-10-08 17:34:35","end_date":"2008-10-08 17:56:03","geom_change":"N"}]'::jsonb
)
)
select (j->>'seq_no')::int as seq_no,
(j->>'start_date')::timestamp as start_date,
(j->>'end_date')::timestamp as end_date,
j->>'geom_change' as geom_change
from indata i
cross join lateral jsonb_array_elements(jdata) as e(j)
where tsrange((j->>'start_date')::timestamp, (j->>'end_date')::timestamp, '[]') #> '2008-09-30 12:32:28'::timestamp;
db<>fiddle here

Parsing string with multiple delimiters into columns

I want to split strings into columns.
My columns should be:
account_id, resource_type, resource_name
I have a JSON file source that I have been trying to parse via ADF data flow. That hasn't worked for me, hence I flattened the data and brought it into SQL Server (I am open to parsing values via ADF or SQL if anyone can show me how). Please check the JSON file at the bottom.
Use this code to query the data I am working with.
CREATE TABLE test.test2
(
resource_type nvarchar(max) NULL
)
INSERT INTO test.test2 ([resource_type])
VALUES
('account_id:224526257458,resource_type:buckets,resource_name:camp-stage-artifactory'),
('account_id:535533456241,resource_type:buckets,resource_name:tni-prod-diva-backups'),
('account_id:369798452057,resource_type:buckets,resource_name:369798452057-s3-manifests'),
('account_id:460085747812,resource_type:buckets,resource_name:vessel-incident-report-nonprod-accesslogs')
The output that I should be able to query in SQL Server should like this:
account_id
resource_type
resource_name
224526257458
buckets
camp-stage-artifactory
535533456241
buckets
tni-prod-diva-backups
and so forth.
Please help me out and ask for clarification if needed. Thanks in advance.
EDIT:
Source JSON Format:
{
"start_date": "2021-12-01 00:00:00+00:00",
"end_date": "2021-12-31 23:59:59+00:00",
"resource_type": "all",
"records": [
{
"directconnect_connections": [
"account_id:227148359287,resource_type:directconnect_connections,resource_name:'dxcon-fh40evn5'",
"account_id:401311080156,resource_type:directconnect_connections,resource_name:'dxcon-ffxgf6kh'",
"account_id:401311080156,resource_type:directconnect_connections,resource_name:'dxcon-fg5j5v6o'",
"account_id:227148359287,resource_type:directconnect_connections,resource_name:'dxcon-fgvfo1ej'"
]
},
{
"virtual_interfaces": [
"account_id:227148359287,resource_type:virtual_interfaces,resource_name:'dxvif-fgvj25vt'",
"account_id:227148359287,resource_type:virtual_interfaces,resource_name:'dxvif-fgbw5gs0'",
"account_id:401311080156,resource_type:virtual_interfaces,resource_name:'dxvif-ffnosohr'",
"account_id:227148359287,resource_type:virtual_interfaces,resource_name:'dxvif-fg18bdhl'",
"account_id:227148359287,resource_type:virtual_interfaces,resource_name:'dxvif-ffmf6h64'",
"account_id:390251991779,resource_type:virtual_interfaces,resource_name:'dxvif-fgkxjhcj'",
"account_id:227148359287,resource_type:virtual_interfaces,resource_name:'dxvif-ffp6kl3f'"
]
}
]
}
Since you don't have a valid JSON string and not wanting to get in the business of string manipulation... perhaps this will help.
Select B.*
From test2 A
Cross Apply ( Select account_id = max(case when value like 'account_id:%' then stuff(value,1,11,'') end )
,resource_type = max(case when value like 'resource_type:%' then stuff(value,1,14,'') end )
,resource_name = max(case when value like 'resource_name:%' then stuff(value,1,14,'') end )
from string_split(resource_type,',')
)B
Results
account_id resource_type resource_name
224526257458 buckets camp-stage-artifactory
535533456241 buckets tni-prod-diva-backups
369798452057 buckets 369798452057-s3-manifests
460085747812 buckets vessel-incident-report-nonprod-accesslogs
Unfortunately, the values inside the arrays are not valid JSON. You can patch them up by adding {} to the beginning/end, and adding " on either side of : and ,.
DECLARE #json nvarchar(max) = N'{
"start_date": "2021-12-01 00:00:00+00:00",
"end_date": "2021-12-31 23:59:59+00:00",
"resource_type": "all",
"records": [
{
"directconnect_connections": [
"account_id:227148359287,resource_type:directconnect_connections,resource_name:''dxcon-fh40evn5''",
"account_id:401311080156,resource_type:directconnect_connections,resource_name:''dxcon-ffxgf6kh''",
"account_id:401311080156,resource_type:directconnect_connections,resource_name:''dxcon-fg5j5v6o''",
"account_id:227148359287,resource_type:directconnect_connections,resource_name:''dxcon-fgvfo1ej''"
]
},
{
"virtual_interfaces": [
"account_id:227148359287,resource_type:virtual_interfaces,resource_name:''dxvif-fgvj25vt''",
"account_id:227148359287,resource_type:virtual_interfaces,resource_name:''dxvif-fgbw5gs0''",
"account_id:401311080156,resource_type:virtual_interfaces,resource_name:''dxvif-ffnosohr''",
"account_id:227148359287,resource_type:virtual_interfaces,resource_name:''dxvif-fg18bdhl''",
"account_id:227148359287,resource_type:virtual_interfaces,resource_name:''dxvif-ffmf6h64''",
"account_id:390251991779,resource_type:virtual_interfaces,resource_name:''dxvif-fgkxjhcj''",
"account_id:227148359287,resource_type:virtual_interfaces,resource_name:''dxvif-ffp6kl3f''"
]
}
]
}';
SELECT
j4.account_id,
j4.resource_type,
TRIM('''' FROM j4.resource_name) resource_name
FROM OPENJSON(#json, '$.records') j1
CROSS APPLY OPENJSON(j1.value) j2
CROSS APPLY OPENJSON(j2.value) j3
CROSS APPLY OPENJSON('{"' + REPLACE(REPLACE(j3.value, ':', '":"'), ',', '","') + '"}')
WITH (
account_id bigint,
resource_type varchar(20),
resource_name varchar(100)
) j4;
db<>fiddle
The first three calls to OPENJSON have no schema, so the resultset is three columns: key value and type. In the case of arrays (j1 and j3), key is the index into the array. In the case of single objects (j2), key is each property name.

Snowflake Retrieve value from Semi Structured Data

I'm trying to retrieve the health value from Snowflake semi structured data in a variant column called extra from table X.
An example of the code can be seen below:
[
{
"party":
"[{\"class\":\"Farmer\",\"gender\":\"Female\",\"ethnicity\":\"NativeAmerican\",\"health\":2},
{\"class\":\"Adventurer\",\"gender\":\"Male\",\"ethnicity\":\"White\",\"health\":3},
{\"class\":\"Farmer\",\"gender\":\"Male\",\"ethnicity\":\"White\",\"health\":0},
{\"class\":\"Banker\",\"gender\":\"Female\",\"ethnicity\":\"White\",\"health\":0}
}
]
I have tried reading the Snowflake documentation from https://community.snowflake.com/s/article/querying-semi-structured-data
I have also tried the following queries to flatten the query:
SELECT result.value:health AS PartyHealth
FROM X
WHERE value = 'Trail'
AND name = 'Completed'
AND PartyHealth > 0,
TABLE(FLATTEN(X, 'party')) result
AND
SELECT [0]['party'][0]['health'] AS Health
FROM X
WHERE value = 'Trail'
AND name = 'Completed'
AND PH > 0;
I am trying to retrieve the health value from table X from column extra which contains the the variant party, which has 4 repeating values [0-3]. Im not sure how to do this is someone able to tell me how to query semi structured data in Snowflake, considering the documentation doesn't make much sense?
First, the JSON value you posted seems wrong formatted (might be a copy paste issue).
Here's an example that works:
first your JSON formatted:
[{ "party": [ {"class":"Farmer","gender":"Female","ethnicity":"NativeAmerican","health":2}, {"class":"Adventurer","gender":"Male","ethnicity":"White","health":3}, {"class":"Farmer","gender":"Male","ethnicity":"White","health":0}, {"class":"Banker","gender":"Female","ethnicity":"White","health":0} ] }]
create a table to test:
CREATE OR REPLACE TABLE myvariant (v variant);
insert the JSON value into this table:
INSERT INTO myvariant SELECT PARSE_JSON('[{ "party": [ {"class":"Farmer","gender":"Female","ethnicity":"NativeAmerican","health":2}, {"class":"Adventurer","gender":"Male","ethnicity":"White","health":3}, {"class":"Farmer","gender":"Male","ethnicity":"White","health":0}, {"class":"Banker","gender":"Female","ethnicity":"White","health":0} ] }]');
now, to select a value you start from column name, in my case v, and as your JSON is an array inside, I specify first value [0], and from there expand, so something like this:
SELECT v[0]:party[0].health FROM myvariant;
Above gives me:
For the other rows you can simply do:
SELECT v[0]:party[1].health FROM myvariant;
SELECT v[0]:party[2].health FROM myvariant;
SELECT v[0]:party[3].health FROM myvariant;
Another option might be to make the data more like a table ... I find it easier to work with than the JSON :-)
Code at bottom - just copy/paste and it runs in Snowflake returning screenshot below.
Key Doco is Lateral Flatten
SELECT d4.path, d4.value
from
lateral flatten(input=>PARSE_JSON('[{ "party": [ {"class":"Farmer","gender":"Female","ethnicity":"NativeAmerican","health":2}, {"class":"Adventurer","gender":"Male","ethnicity":"White","health":3}, {"class":"Farmer","gender":"Male","ethnicity":"White","health":0}, {"class":"Banker","gender":"Female","ethnicity":"White","health":0} ] }]') ) as d ,
lateral flatten(input=> value) as d2 ,
lateral flatten(input=> d2.value) as d3 ,
lateral flatten(input=> d3.value) as d4

Google Bigquery on reddit database to get post along with all the comments

I am new to Google Big-query, I want to extract post title,post body,comments, score and the creation date from the database for all posts that are created on or after 2010 for a subreddit, for now I have been able to query all subreddit comments using
SELECT * FROM `pushshift.rt_reddit.comments` WHERE lower(subreddit)="politics"
But my motive is to join the comments and posts table in order to generate the required results but I am not able to find how to do this, how can that be achieved? Please let me know if any further details are required. Thanks
Just a quick note ... the tables you reference seemed to stop # 2018-08-27 06:59:08 UTC - meaning you may need to find another datasource if you're looking for more recent posts/comments.
Standard SQL :
SELECT
s.title,
s.selftext,
s.score,
s.created_utc post_created_utc,
s.author,
ARRAY_AGG( STRUCT( c.body,
c.created_utc,
c.author ) ) comments
FROM
`pushshift.rt_reddit.submissions` s
LEFT OUTER JOIN
`pushshift.rt_reddit.comments` c
ON
CAST(s.id AS string) = c.link_id
WHERE
REGEXP_CONTAINS(c.subreddit, r'(?i)^politics$')
AND s.created_utc > '2009-12-31'
GROUP BY
1,
2,
3,
4,
5
LIMIT
10;
Date SQL :
SELECT
MAX(created_utc)
FROM
`pushshift.rt_reddit.submissions`
Code for fh-bigquery.reddit_comments ... works the same. Maybe use this post 2018 and earlier code pre 2018.
SELECT
s.title,
s.selftext,
s.score,
TIMESTAMP_SECONDS(s.created_utc ) post_created_utc,
s.author,
c.subreddit,
ARRAY_AGG( STRUCT( c.body,
c.created_utc,
c.author ) ) comments
FROM
`fh-bigquery.reddit_posts.20*` s
LEFT OUTER JOIN
`fh-bigquery.reddit_comments.20*` c
ON
regexp_extract(c.link_id,r'(.{6})\s*$') = s.id
WHERE
TIMESTAMP_SECONDS(s.created_utc ) between '2019-01-01' and '2019-01-03'
GROUP BY
1,
2,
3,
4,
5,
6
LIMIT
10;

N1QL : Find latest status from an array

I have a document of type 'User' as-
{
"id":"User-1",
"Name": "Kevin",
"Gender":"M",
"Statuses":[
{
"Status":"ONLINE",
"StatusChangedDate":"2017-11-01T17:12:00Z"
},
{
"Status":"OFFLINE",
"StatusChangedDate":"2017-11-02T13:24:00Z"
},
{
"Status":"ONLINE",
"StatusChangedDate":"2017-11-02T14:35:00Z"
},
{
"Status":"OFFLINE",
"StatusChangedDate":"2017-11-02T15:47:00Z"
}.....
],
"type":"User"
}
I need user's information along with his latest status details based on a particular date (or date range).
I am able to achieve this using subquery and Unnest clause.
Select U.Name, U.Gender, S.Status, S.StatusChangedDate
From (Select U1.id, max(U1.StatusChangedDate) as StatusChangedDate
From UserInformation U1
Unnest Statuses S1
Where U1.type = 'User'
And U1.StatusChangedDate between '2017-11-02T08:00:00Z' And '2017-11-02T11:00:00Z'
And U1.Status = 'ONLINE'
Group by U1.id
) A
Join UserInformation U On Keys A.id
Unnest U.Statuses S
Where U.StatusChangedDate = A.StatusChangedDate;
But is there any other way of achieving this (like by using collection operators and array functions)??
If yes, please provide me a query or guide me through it.
Thanks.
MAX, MIN argument allows array. 0th element of array can be field needs aggregate and 1st element is what you want to carry.
Using this techinuqe you can project non aggregtae field for MIN/MAX like below.
SELECT U.Name, U.Gender, S.Status, S.StatusChangedDate
FROM UserInformation U1
UNNEST Statuses S1
WHERE U1.type = 'User'
AND S1.StatusChangedDate BETWEEN '2017-11-02T08:00:00Z' AND '2017-11-02T11:00:00Z'
AND S1.Status = 'ONLINE'
GROUP BY U1.id
LETTING S = MAX([S1.StatusChangedDate,S1])[1];
In 4.6.3+ you can also try this without UNNEST Using subquery expressions https://developer.couchbase.com/documentation/server/current/n1ql/n1ql-language-reference/subqueries.html . Array indexing query will be faster.
CREATE INDEX ix1 ON UserInformation(ARRAY v FOR v IN Statuses WHEN v.Status = 'ONLINE' END) WHERE type = "User";
SELECT U.Name, U.Gender, S.Status, S.StatusChangedDate
FROM UserInformation U1
LET S = (SELECT RAW MAX([S1.StatusChangedDate,S1])[1]
FROM U1.Statuses AS S1
WHERE S1.StatusChangedDate BETWEEN '2017-11-02T08:00:00Z' AND '2017-11-02T11:00:00Z' AND S1.Status = 'ONLINE')[0]
WHERE U1.type = 'User'
AND ANY v IN U1.Statuses SATISFIES
v.StatusChangedDate BETWEEN '2017-11-02T08:00:00Z' AND '2017-11-02T11:00:00Z' AND v.Status = 'ONLINE' END;

Resources