splunk query taking long time to return the value, can we eliminate append - query-optimization

i have initially used inputlook to get the output and query was returning output in fractions of sec, but now i want to use the source as input and run the Splunk query but its taking lot of time to return output.
Please suggest solution to optimise the output time.
I am thinking of removing multiple append
index=csvlookups source="F:\\SplunkMonitor\\csvlookups\\Core_Network\\lookup_table_sip_pbx_usage.csv" OR source="F:\\SplunkMonitor\\csvlookups\\Core_Network\\lookup_table_dpt_capacity.csv" OR source="F:\\SplunkMonitor\\csvlookups\\Core_Network\\lookup_table_sip_pbx_forecasts.csv"
| eval Date=strftime(strptime(Date,"%m/%d/%Y"),"%Y-%m-%d")
| sort Date, CLLI
| rename CLLI as Office
| search Office="CLGRAB21DS1"
| stats sum(Usage) as Usage by Office, Date
| append
[ search index=csvlookups source="F:\\SplunkMonitor\\csvlookups\\Core_Network\\lookup_table_sip_pbx_usage.csv" OR source="F:\\SplunkMonitor\\csvlookups\\Core_Network\\lookup_table_dpt_capacity.csv" OR source="F:\\SplunkMonitor\\csvlookups\\Core_Network\\lookup_table_sip_pbx_forecasts.csv"
| eval Date=strftime(strptime(Date,"%m/%d/%Y"),"%Y-%m-%d")
| reverse
| search Office="CLGRAB21DS1" AND Type="SIP PBX"
| fields Date NB_RTU
| fields - _raw _time ]
| sort Date
| fillnull value="CLGRAB21DS1" Office
| filldown Usage
| filldown NB_RTU
| fillnull value=0 Usage
| eval _time = strptime(Date, "%Y-%m-%d")
| eval latest_time = if("now" == "now", now(), relative_time(now(), "now"))
| where ((_time >= relative_time(now(), "-3y#h")) AND (_time <= latest_time))
| fields - latest_time Date
| append
[ gentimes start=-1
| eval Date=strftime(mvrange(now(),now()+60*60*24*365*3,"1mon"),"%F")
| mvexpand Date
| fields Date
| append
[ search index=csvlookups source="F:\\SplunkMonitor\\csvlookups\\Core_Network\\lookup_table_sip_pbx_usage.csv" OR source="F:\\SplunkMonitor\\csvlookups\\Core_Network\\lookup_table_dpt_capacity.csv" OR source="F:\\SplunkMonitor\\csvlookups\\Core_Network\\lookup_table_sip_pbx_forecasts.csv"
| rename "Expected Date of Addition" as edate
| eval edate=strftime(strptime(edate,"%m/%d/%Y"),"%Y-%m-%d")
| rename edate as "Expected Date of Addition"
| table Contact Customer "Expected Date of Addition" "Number of Channels" Switch
| reverse
| search Customer = "Regular Usage" AND Switch = "CLGRAB21DS1"
| rename "Number of Channels" as val
| return $val ]
| reverse
| filldown search
| rename search as Usage
| where Date != ""
| reverse
| append
[ search index=csvlookups source="F:\\SplunkMonitor\\csvlookups\\Core_Network\\lookup_table_sip_pbx_usage.csv" OR source="F:\\SplunkMonitor\\csvlookups\\Core_Network\\lookup_table_dpt_capacity.csv" OR source="F:\\SplunkMonitor\\csvlookups\\Core_Network\\lookup_table_sip_pbx_forecasts.csv"
| rename "Expected Date of Addition" as edate
| eval edate=strftime(strptime(edate,"%m/%d/%Y"),"%Y-%m-%d")
| rename edate as "Expected Date of Addition"
| table Contact Customer "Expected Date of Addition" "Number of Channels" Switch
| reverse
| search Customer != "Regular Usage" AND Switch = "CLGRAB21DS1"
| rename "Expected Date of Addition" as Date
| eval _time=strptime(Date, "%Y-%m-%d")
| rename "Number of Channels" as Forecast
| stats sum(Forecast) as Forecast by Date]
| sort Date
| rename Switch as Office
| eval Forecast1 = if(isnull(Forecast),Usage,Forecast)
| fields - Usage Forecast
| streamstats sum(Forecast1) as Forecast
| fields - Forecast1
| eval Date=strptime(Date, "%Y-%m-%d")
| eval Date=if(Date < now(), now(), Date) ]
| filldown Usage
| filldown Office
| eval Forecast = Forecast + Usage
| eval Usage = if(Forecast >= 0,NULL,Usage)
| eval _time=if(isnull(_time), Date, _time)
| timechart limit=0 span=1w max(Usage) as Usage, max(NB_RTU) as NB_RTU, max(Forecast) as Forecast by Office
| rename "NB_RTU: CLGRAB21DS1" as "RTU's Purchased", "Usage: CLGRAB21DS1" as "Usage", "Forecast: CLGRAB21DS1" as "Forecast"
| filldown "RTU's Purchased" |sort -Forecast

Definitely an expensive query you don't want to run often or over large timeranges. In your first append, why are you using reverse? Are you trying to get latest time and earliest time which is why you used the append? You could use earliest and latest for this and eliminate the first subsearch. You could also consider eventstats instead of stats on that first search since you'll still retain the raw data.
You're also summing by _time, so you should think about binning your _time spans (i.e. | bin Date span=1h). Also, why are you using filldown? I'm guessing you want to grab values from different rows and need the rows to match? If so, use streamstats for this

If inputlookup was working well you should stick with that as you won't get much faster.
It's hard to give specific advice about your query without knowing more about the data and your end goals. In general:
Filter early. Make your base query (before the first '|') as specific as possible. Run your where and search clauses as soon as you can.
Use fields instead of table. It's more efficient.
Sort only when necessary. Usually, it's not necessary.
Fewer appends is better.

Related

CannotPlanException after "CROSS JOIN UNNEST"

When I create a VIEW as a result of "CROSS JOIN UNNEST" and then use the condition in the WHERE clause of the VIEW, it throws an exception "org.apache.calcite.plan.RelOptPlanner$CannotPlanException".
Why am I getting this exception and how should I handle it the right way?
The following is the test code in which an error occurs.
it should "filter with object_key" in {
tEnv.executeSql(
s"""CREATE TABLE s3_put_event (
| Records ARRAY<
| ROW<
| s3 ROW<
| bucket ROW<name STRING>,
| object ROW<key STRING, size BIGINT>
| >
| >
| >
|) WITH (
| 'connector' = 'datagen',
| 'number-of-rows' = '3',
| 'rows-per-second' = '1',
| 'fields.Records.element.s3.bucket.name.length' = '8',
| 'fields.Records.element.s3.object.key.length' = '15',
| 'fields.Records.element.s3.object.size.min' = '1',
| 'fields.Records.element.s3.object.size.max' = '1000'
|)
|""".stripMargin
)
tEnv.executeSql(
s"""CREATE TEMPORARY VIEW s3_objects AS
|SELECT object_key, bucket_name
|FROM (
| SELECT
| r.s3.bucket.name AS bucket_name,
| r.s3.object.key AS object_key,
| r.s3.object.size AS object_size
| FROM s3_put_event
| CROSS JOIN UNNEST(s3_put_event.Records) AS r(s3)
|) rs
|WHERE object_size > 0
|""".stripMargin
)
tEnv.executeSql(
s"""CREATE TEMPORARY VIEW filtered_s3_objects AS
|SELECT bucket_name, object_key
|FROM s3_objects
|WHERE object_key > ''
|""".stripMargin)
val result = tEnv.sqlQuery("SELECT * FROM filtered_s3_objects")
tEnv.toChangelogStream(result).print()
env.execute()
}
If I remove the condition object_key > '' in the "filtered_s3_objects" VIEW, and do it in the "s3_objects" VIEW, no exception is thrown.
However, my actual query is complicated, so it is not easy to move the condition of the WHERE clause like this. It's hard to use especially if I need to separate the output stream.
I'm not sure that you can use a CROSS JOIN UNNEST on an array with a nested hierarchy (given that you have a ROW in your ARRAY). Either way, could you file a Jira ticket for this? https://issues.apache.org/jira/projects/FLINK/issues/

JQ using not with IN does not work or have any effect?

This code works as expected:
jq --argjson BL ${BL} '.rows[] | select(.cells[] | .value | IN($BL[]))
It returns a list of elements that contain a value in $BL
I want to return all those that are not in $BL, so I use | not
It returns the exact same result as without the | not, it seems to make no difference.
jq --argjson BL ${BL} '.rows[] | select(.cells[] | .value | IN($BL[]) | not)
using the following retuned nothing at all
jq --argjson BL ${BL} '.rows[] | select(.cells[] | .value | IN($BL[]|not))
is there a simple thing I'm missing with using IN with NOT?
for reference $BL is and array on email address, trying to make an api call and return all elements that don't have an email listed in $BL
Your select receives a series of boolean values, one for each item in the .cells array. Using not inverts all of them, which means if you had a mixed set of boolean values, it would still be mixed, and in either case select would take those being evaluated to true.
The solution is to use any or all to aggregate these boolean values. Without any sample data, I assume you are looking for
.rows[] | select(any(.cells[]; .value | IN($BL[])) | not)

Is there a documented list of Snowflake query types?

I am working with the view SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY. It would be extremely helpful to have an exhaustive list of query types that might appear in the column QUERY_TYPE, with the type of commands that generate them. For example, does a PUT command generate a PUT query type? Or is it something like "LOAD"?
If anyone knows where such a list can be found, please post a link. Snowflake's documentation of the view does not provide any list.
Thanks all who have answered so far. Since the consensus is that no such list exists, here is a merge of the entries provided so far with the values found in my own database. Please keep posting additional answers if your DB contains entries not found below. This way, sooner or later, we will have a fairly complete list:
QUERY_TYPE
CREATE_USER
REVOKE
DROP_CONSTRAINT
RENAME_SCHEMA
UPDATE
CREATE_VIEW
CREATE_TASK
RENAME_TABLE
INSERT
ALTER_TABLE_ADD_COLUMN
RENAME_COLUMN
MERGE
BEGIN_TRANSACTION
ALTER_VIEW_MODIFY_SECURITY
GRANT
ALTER_SESSION
DELETE
DROP_ROLE
DESCRIBE
UNKNOWN
TRUNCATE_TABLE
DROP
SHOW
ALTER_WAREHOUSE_SUSPEND
GET_FILES
UNLOAD
CREATE_NETWORK_POLICY
ALTER_TABLE_DROP_COLUMN
CREATE
REMOVE_FILES
ALTER
ALTER_USER
PUT_FILES
COPY
ALTER_ACCOUNT
DROP_TASK
CREATE_CONSTRAINT
DESCRIBE_QUERY
SELECT
RENAME_USER
COMMIT
RENAME_VIEW
USE
CREATE_TABLE
ALTER_NETWORK_POLICY
CREATE_ROLE
ALTER_TABLE_MODIFY_COLUMN
SET
ALTER_USER_ABORT_ALL_JOBS
ROLLBACK
LIST_FILES
UNSET
CREATE_TABLE_AS_SELECT
DROP_USER
ALTER_WAREHOUSE_RESUME
QUERY_TYPE
ALTER_PIPE
ALTER_ROLE
ALTER_TABLE
ALTER_TABLE_DROP_CLUSTERING_KEY
ALTER_USER_RESET_PASSWORD
CREATE_EXTERNAL_TABLE
CREATE_MASKING_POLICY
CREATE_SEQUENCE
CREATE_STREAM
DROP_STREAM
RENAME_DATABASE
RENAME_FILE_FORMAT
RENAME_ROLE
RENAME_WAREHOUSE
RESTORE
By the looks of it there is no complete list of query types that show up in this table. Best I can do is give you a list from my own database, which still doesn't contain things like alter role etc. To answer your other question a PUT command is actually PUT_FILES by the looks of it:
select distinct query_type from SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY;
+-------------------------+
|QUERY_TYPE |
+-------------------------+
|ALTER |
|ALTER_SESSION |
|ALTER_TABLE_ADD_COLUMN |
|ALTER_TABLE_DROP_COLUMN |
|ALTER_TABLE_MODIFY_COLUMN|
|ALTER_USER |
|ALTER_WAREHOUSE_RESUME |
|ALTER_WAREHOUSE_SUSPEND |
|BEGIN_TRANSACTION |
|COMMIT |
|COPY |
|CREATE |
|CREATE_CONSTRAINT |
|CREATE_EXTERNAL_TABLE |
|CREATE_MASKING_POLICY |
|CREATE_ROLE |
|CREATE_SEQUENCE |
|CREATE_STREAM |
|CREATE_TABLE |
|CREATE_TABLE_AS_SELECT |
|CREATE_USER |
|CREATE_VIEW |
|DELETE |
|DESCRIBE |
|DESCRIBE_QUERY |
|DROP |
|DROP_CONSTRAINT |
|DROP_STREAM |
|DROP_USER |
|GET_FILES |
|GRANT |
|INSERT |
|LIST_FILES |
|MERGE |
|PUT_FILES |
|REMOVE_FILES |
|RENAME_COLUMN |
|RENAME_DATABASE |
|RENAME_TABLE |
|RESTORE |
|REVOKE |
|ROLLBACK |
|SELECT |
|SET |
|SHOW |
|TRUNCATE_TABLE |
|UNKNOWN |
|UNLOAD |
|UPDATE |
|USE |
+-------------------------+
Added ours ... 16 extra's ... pass it on :-)
QUERY_TYPE
ALTER
ALTER_ACCOUNT
ALTER_PIPE
ALTER_ROLE
ALTER_SESSION
ALTER_TABLE
ALTER_TABLE_ADD_COLUMN
ALTER_TABLE_DROP_CLUSTERING_KEY
ALTER_TABLE_DROP_COLUMN
ALTER_TABLE_MODIFY_COLUMN
ALTER_USER
ALTER_USER_ABORT_ALL_JOBS
ALTER_USER_RESET_PASSWORD
ALTER_WAREHOUSE_RESUME
ALTER_WAREHOUSE_SUSPEND
BEGIN_TRANSACTION
COMMIT
COPY
CREATE
CREATE_CONSTRAINT
CREATE_EXTERNAL_TABLE
CREATE_MASKING_POLICY
CREATE_NETWORK_POLICY
CREATE_ROLE
CREATE_SEQUENCE
CREATE_STREAM
CREATE_TABLE
CREATE_TABLE_AS_SELECT
CREATE_TASK
CREATE_USER
CREATE_VIEW
DELETE
DESCRIBE
DESCRIBE_QUERY
DROP
DROP_CONSTRAINT
DROP_ROLE
DROP_STREAM
DROP_TASK
DROP_USER
GET_FILES
GRANT
INSERT
LIST_FILES
MERGE
PUT_FILES
REMOVE_FILES
RENAME_COLUMN
RENAME_DATABASE
RENAME_FILE_FORMAT
RENAME_ROLE
RENAME_SCHEMA
RENAME_TABLE
RENAME_USER
RENAME_VIEW
RENAME_WAREHOUSE
RESTORE
REVOKE
ROLLBACK
SELECT
SET
SHOW
TRUNCATE_TABLE
UNKNOWN
UNLOAD
UNSET
UPDATE
USE
Here are some additional ones:
ALTER_AUTO_RECLUSTER
ALTER_SET_TAG
ALTER_TABLE_MODIFY_CONSTRAINT
ALTER_UNSET_TAG
CALL
DROP_SESSION_POLICY
RECLUSTER

Trying to output 2 objects from an array using a sort-object on one of them

I have an array of scheduled tasks and their respective run times. I want to sort the array into next-run order but I cannot select both the task name and run time for output.
I get the scheduled tasks and run times from the remote computer like this:
$Array = #(Invoke-Command -CN blahcomputernameblah {schtasks.exe /query /fo csv | ConvertFrom-Csv | select "Next Run Time" , TaskName)
The result is like this:
Next Run Time TaskName
------------- --------
6/3/2019 8:00:00 PM \Start Banana
6/3/2019 4:00:00 PM \Start Apple
6/5/2019 9:30:00 AM \Start Orange
6/3/2019 10:15:00 PM \Stop Banana
6/3/2019 6:15:00 PM \Stop Apple
6/5/2019 11:45:00 AM \Stop Orange
The next task to run will be \Start Apple at 6/3/2019 4:00:00 PM.
So I want so sort that array based on next run time and select -first 1 but I can only get the next run time without the task name. I am converting the date and time string to DateTime for the sort action:
$Array | %{[DateTime] $_."Next Run Time"} | sort | select -First 1
But I do not know how to add the TaskName to the output
Any help please?
As the output of schtasks.exe is localized I suggest to use the equivalent PowerShell cmdlets directly to avoid the [datetime] conversion firsthand.
Get-ScheduledTask |
Get-ScheduledTaskInfo |
Where-Object NextRunTime |
Sort-Object NextRunTime |
Select-Object TaskName,NextRunTime -First 1
Your can create a propert y into your select like this :
schtasks.exe /query /fo csv | ConvertFrom-Csv | select #{N="DateRunTime";E={[DateTime]$_."Next Run Time"}}, TaskName | sort DateRunTime | select -First 1
Try doing what you need with the help of the Task Scheduler scripting objects and their APIs. Complete information can be obtained from the Task Scheduler Reference. The following script demonstrates how to do this.
$schedsvc = New-Object -ComObject Schedule.Service
$schedsvc.Connect()
$fq = New-Object System.Collections.Queue
$fq.Enqueue($schedsvc.GetFolder('\'))
$tc = while ( $fq.Count -gt 0 ) {
$f = $fq.Dequeue()
$f.GetFolders(0) | ForEach-Object { $fq.Enqueue($_) }
$f.GetTasks(0) | Select-Object Name, NextRunTime
}
$tc | Where-Object { $_.NextRunTime -gt (Get-Date)} |
Sort-Object -Property NextRunTime | Select-Object -First 1 |
Format-Table -AutoSize
Output:
Name NextRunTime
---- -----------
Opera scheduled Autoupdate 1542563984 02.06.2019 22:17:10

How to cast integer to binary in mariadb

I have trouble with cast function in mariadb. With the same query but result in sql server and mariadb are difference.
In sql server:
Query: select CAST(1234 as binary(10))
Result: 0x000000000000000004D2
In mariadb:
Query: select CAST(1234 as binary(10))
Result: 1234
I don't understand it. Please help me explain and suggest solution.
binary is a character type, but there is hex function to get the hexadecimal value of a expression.
select hex(1234);
+-----------+
| hex(1234) |
+-----------+
| 4D2 |
+-----------+
If you want the same format it can be done using other functions:
select concat('0x', lpad(hex(1234),20,'0')) ;
+--------------------------------------+
| concat('0x', lpad(hex(1234),16,'0')) |
+--------------------------------------+
| 0x000000000000000004D2 |
+--------------------------------------+

Resources