How to inner join two windowed tables in Flux query language? - database

The goal is to join tables min and max returned by the following query:
data = from(bucket: "my_bucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
min = data
|> aggregateWindow(
every: 1d,
fn: min,
column: "_value")
max = data
|> aggregateWindow(
every: 1d,
fn: max,
column: "_value")
The columns of max look like this:
+---------------------------------+
| Columns |
+---------------------------------+
| table MAX |
| _measurement GROUP STRING |
| _field GROUP STRING |
| _value NO GROUP DOUBLE |
| _start GROUP DATETIME:RFC3339 |
| _stop GROUP DATETIME:RFC3339 |
| _time NO GROUP DATETIME:RFC3339 |
| env GROUP STRING |
| path GROUP STRING |
+---------------------------------+
The min table looks the same except the name of the first column. Both tables return data which can be confirmed by running yield(tables:min) or yield(tables:max). The join should be an inner join on columns _measurement, _field, _time, env and path and it should contain both the minimum and the maximum value _value of every window.
When I try to run within influxdb DataExplorer
join(tables: {min: min, max: max}, on: ["_time", "_field", "path", "_measurement", "env"], method: "inner")
I get the following error:
Failed to execute Flux query
When I run the job in Bash via influx query --file ./query.flux -r > ./query.csv; I get the following error:
Error: failed to execute query: 504 Gateway Timeout: unable to decode response content type "text/html; charset=utf-8"
No more logging-output is available to investigate the issue further. Whats wrong with this join?

join can only take two tables as the parameters according to this doc. You could try the union where it can take more than two tables as the input. See more details here.
You might just need to modify the script as below:
union(tables: [min: min, max: max]

Related

Store a list of values as a string when creating a table in snowflake

I am trying to create a table with 5 columns. COLUMN #2 (PROGRESS) is a comma seperated list (i.e 1,2,3,4 etc.) but when trying to create this table as either a string, variant or varchar, Snowflake refuses to allow this. Any advice on how I can create a column seperated list from a CSV? I tried to import the data as a TSV, XML, as well as a JSON file but no success.
create or replace TABLE AD_HOC.TEMP.NEW_DATA (
VISITOR_ID VARCHAR(16777216),
PROGRESS VARCHAR(16777216),
DATE DATETIME,
ROLE VARCHAR(16777216),
FIRST_VISIT DATETIME
)COMMENT='Interaction data'
;
Goal:
VISITOR_ID | PROGRESS | DATE | ROLE | FIRST_VISIT
111 | [1,2,3] | 1/1/2022 | OWNER | 1/1/2021
123 | [1] | 1/2/2022 | ADMIN | 2/2/2021
23321 | [1,2,3,4] | 2/22/2022 | USER | 3/12/2021
I encoded the column in python and loaded the data in Snowflake!
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
df = doc_data.join(pd.DataFrame(mlb.fit_transform(doc_data.pop('PROGRESS')),
columns=mlb.classes_,
index=doc_data.index))
df

Hive lateral view not working AWS Athena

Im working on a process of AWS Cloudtrail log analysis, Im getting stuck in extract JSON from a row,
This is my table definition.
CREATE EXTERNAL TABLE cloudtrail_logs (
eventversion STRING,
eventName STRING,
awsRegion STRING,
requestParameters STRING,
elements STRING ,
additionalEventData STRING
)
ROW FORMAT SERDE 'com.amazon.emr.hive.serde.CloudTrailSerde'
STORED AS INPUTFORMAT 'com.amazon.emr.cloudtrail.CloudTrailInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://XXXXXX/CloudTrail'
If I run select elements from cl1 limit 1 it returns this result.
{"groupId":"sg-XXXX","ipPermissions":{"items":[{"ipProtocol":"tcp","fromPort":22,"toPort":22,"groups":{},"ipRanges":{"items":[{"cidrIp":"0.0.0.0/0"}]},"prefixListIds":{}}]}}
I need to show this result as virtual columns like,
| groupId | ipProtocol | fromPort | toPort| ipRanges.items.cidrIp|
|---------|------------|--------- | ------|-----------------------------|
| -1 | 0 | | | |
Im using AWS Athena and I tried Lateral view and get_json_object is not working in AWS.
its an external table
select json_extract_scalar(i.item,'$.ipProtocol') as ipProtocol
,json_extract_scalar(i.item,'$.fromPort') as fromPort
,json_extract_scalar(i.item,'$.toPort') as toPort
from cloudtrail_logs
cross join unnest (cast(json_extract(elements,'$.ipPermissions.items')
as array(json))) as i (item)
;
ipProtocol | fromPort | toPort
------------+----------+--------
"tcp" | 22 | 22

Case insensitive Postgres query with array contains

I have records which contain an array of tags like these:
id | title | tags
--------+-----------------------------------------------+----------------------
124009 | bridge photo | {bridge,photo,Colors}
124018 | Zoom 5 | {Recorder,zoom}
123570 | Sint et | {Reiciendis,praesentium}
119479 | Architecto consectetur | {quia}
I'm using the following SQL query to fetch a specific record by tags ('bridge', 'photo', 'Colors'):
SELECT "listings".* FROM "listings" WHERE (tags #> ARRAY['bridge', 'photo', 'Colors']::varchar[]) ORDER BY "listings"."id" ASC LIMIT $1 [["LIMIT", 1]]
And this returns a first record in this table.
The problem with this is that I have mixed type cases and I would like this to return the same result if I search for: bridge, photo, colors. Essentially I need to make this search case-insensitive but can't find a way to do so with Postgres.
This is the SQL query I've tried which is throwing errors:
SELECT "listings".* FROM "listings" WHERE (LOWER(tags) #> ARRAY['bridge', 'photo', 'colors']::varchar[]) ORDER BY "listings"."id" ASC LIMIT $1
This is the error:
PG::UndefinedFunction: ERROR: function lower(character varying[]) does not exist
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
You can convert text array elements to lower case in the way like this:
select lower(tags::text)::text[]
from listings;
lower
--------------------------
{bridge,photo,colors}
{recorder,zoom}
{reiciendis,praesentium}
{quia}
(4 rows)
Use this in your query:
SELECT *
FROM listings
WHERE lower(tags::text)::text[] #> ARRAY['bridge', 'photo', 'colors']
ORDER BY id ASC;
id | title | tags
--------+--------------+-----------------------
124009 | bridge photo | {bridge,photo,Colors}
(1 row)
You can't apply LOWER() to an array directly, but you can unpack the array, apply it to each element, and reassemble it when you're done:
... WHERE ARRAY(SELECT LOWER(UNNEST(tags))) #> ARRAY['bridge', 'photo', 'colors']
You could also install the citext (case-insensitive text) module; if you declare listings.tags as type citext[], your query should work as-is.

Returning BigQuery data filtered on nested objects

I'm trying to create a query which returns data which is filtered on 2 nested objects. I've added (1) and (2) to the code to indicate that I want results from two different nested objects (I know that this isn't a valid query). I've been looking at WITHIN RECORD but I can't get my head around it.
SELECT externalIds.value(1) AS appName, externalIds.value(2) AS driverRef, SUM(quantity)/ 60 FROM [billing.tempBilling]
WHERE callTo = 'example' AND externalIds.type(1) = 'driverRef' AND externalIds.type(2) = 'applicationName'
GROUP BY appName, driverRef ORDER BY appName, driverRef;
The data loaded into BigQuery looks like this:
{
"callTo": "example",
"quantity": 120,
"externalIds": [
{"type": "applicationName", "value": "Example App"},
{"type": "driverRef", "value": 234}
]
}
The result I'm after is this:
+-------------+-----------+----------+
| appName | driverRef | quantity |
+-------------+-----------+----------+
| Example App | 123 | 12.3 |
| Example App | 234 | 132.7 |
| Test App | 142 | 14.1 |
| Test App | 234 | 17.4 |
| Test App | 347 | 327.5 |
+-------------+-----------+----------+
If all of the quantities that you need to sum are within the same record, then you can use WITHIN RECORD for this query. Use NTH() WITHIN RECORD to get the first and second values for a field in the record. Then use HAVING to perform the filtering because it requires a value computed by an aggregation function.
SELECT callTo,
NTH(1, externalIds.type) WITHIN RECORD AS firstType,
NTH(1, externalIds.value) WITHIN RECORD AS maybeAppName,
NTH(2, externalIds.type) WITHIN RECORD AS secondType,
NTH(2, externalIds.value) WITHIN RECORD AS maybeDriverRef,
SUM(quantity) WITHIN RECORD
FROM [billing.tempBilling]
HAVING callTo LIKE 'example%' AND
firstType = 'applicationName' AND
secondType = 'driverRef';
If the quantities to be summed are spread across multiple records, then you can start with this approach and then group by your keys and sum those quantities in an outer query.

Apply function to every element of an array in a SELECT statement

I am listing all functions of a PostgreSQL schema and need the human readable types for every argument of the functions. OIDs of the types a represented as an array in proallargtypes. I can unnest the array and apply format_type() to it, which causes the query to split into multiple rows for a single function. To avoid that I have to create an outer SELECT to GROUP the argtypes again because, apperently, one cannot group an unnested array. All columns are dependent on proname but I have to list all columns in GROUP BY clause, which is unnecessary but proname is not a primary key.
Is there a better way to achieve my goal of an output like this:
proname | ... | protypes
-------------------------------------
test | ... | {integer,integer}
I am currently using this query:
SELECT
proname,
prosrc,
pronargs,
proargmodes,
array_agg(proargtypes), -- see here
proallargtypes,
proargnames,
prodefaults,
prorettype,
lanname
FROM (
SELECT
p.proname,
p.prosrc,
p.pronargs,
p.proargmodes,
format_type(unnest(p.proallargtypes), NULL) AS proargtypes, -- and here
p.proallargtypes,
p.proargnames,
pg_get_expr(p.proargdefaults, 0) AS prodefaults,
format_type(p.prorettype, NULL) AS prorettype,
l.lanname
FROM pg_catalog.pg_proc p
JOIN pg_catalog.pg_language l
ON l.oid = p.prolang
JOIN pg_catalog.pg_namespace n
ON n.oid = p.pronamespace
WHERE n.nspname = 'public'
) x
GROUP BY proname, prosrc, pronargs, proargmodes, proallargtypes, proargnames, prodefaults, prorettype, lanname
you can use internal "undocumented" function pg_catalog.pg_get_function_arguments(p.oid).
postgres=# SELECT pg_catalog.pg_get_function_arguments('fufu'::regproc);
pg_get_function_arguments
---------------------------
a integer, b integer
(1 row)
Now, there are no build "map" function. So unnest, array_agg is only one possible. You can simplify life with own custom function:
CREATE OR REPLACE FUNCTION format_types(oid[])
RETURNS text[] AS $$
SELECT ARRAY(SELECT format_type(unnest($1), null))
$$ LANGUAGE sql IMMUTABLE;
and result
postgres=# SELECT format_types('{21,22,23}');
format_types
-------------------------------
{smallint,int2vector,integer}
(1 row)
Then your query should to be:
SELECT proname, format_types(proallargtypes)
FROM pg_proc
WHERE pronamespace = 2200 AND proallargtypes;
But result will not be expected probably, because proallargtypes field is not empty only when OUT parameters are used. It is empty usually. You should to look to proargtypes field, but it is a oidvector type - so you should to transform to oid[] first.
postgres=# SELECT proname, format_types(string_to_array(proargtypes::text,' ')::oid[])
FROM pg_proc
WHERE pronamespace = 2200
LIMIT 10;
proname | format_types
------------------------------+----------------------------------------------------
quantile_append_double | {internal,"double precision","double precision"}
quantile_append_double_array | {internal,"double precision","double precision[]"}
quantile_double | {internal}
quantile_double_array | {internal}
quantile | {"double precision","double precision"}
quantile | {"double precision","double precision[]"}
quantile_cont_double | {internal}
quantile_cont_double_array | {internal}
quantile_cont | {"double precision","double precision"}
quantile_cont | {"double precision","double precision[]"}
(10 rows)

Resources