Snowflake query_history gets reset after warehouse suspension

Snowflake query_history gets reset after warehouse suspension - snowflake-cloud-data-platform

I am using the following query to retrieve query history from my Snowflake database.
SELECT *
FROM table(MY_DATABASE.information_schema.query_history(
end_time_range_start => dateadd(HOUR, -4, current_timestamp()),
current_timestamp()
));
Oddly, if the warehouse (size: XS) I am using gets suspended after a period of inactivity, the next time I attempt to retrieve query history- the history that was there prior to the warehouse's suspension is gone.
I could not find anything documented to explain this.
Anyone run into this issue or related documentation that could explain this?
Thank you!

I can't explain exactly the limitations of that information schema query you are running (some of them only return like 10,000 rows or like you said, once the warehouse turns off), but it's a limited view into the actual query history. You can use the snowflake database for all query history.
It's a massive table so make sure you put filters on it. Here's an example query to access it:
USE DATABASE snowflake;
USE SCHEMA account_usage;
SELECT *
FROM query_history
WHERE start_time BETWEEN '2020-01-01 00:00' AND '2020-01-03 00:00'
AND DATABASE_NAME = 'DATABASE_NAME'
AND USER_NAME = 'USERNAME'
ORDER BY START_TIME DESC;

1: Your question states that after a period of inactivity, does not specify what is the period of inactivity.
"after a period of inactivity, the next time I attempt to retrieve query history- the history that was there prior to the warehouse's suspension is gone."
If its beyond 7 days then the data can be found from account_usage table. Below is the link of difference between INFORMATION_SCHEMA and ACCOUNT_USAGE.
https://docs.snowflake.com/en/sql-reference/account-usage.html#differences-between-account-usage-and-information-schema
2: Your query does not specify USER_NAME or WAHREHOUSE_NAME in your query so it could be that before the output of your queries before suspension of warehouse may have moved beyond 4 hours period as in your predicate. If you can increase the time period and check if behaviour still exists.
3: In general its not advisable to query INFORMATION_SCHEMA to get query history unless your application requires data without any latency. If possible use ACCOUNT_USAGE table to get query history information.
Here is what I did.
1: Created an XS warehouse
2: Set auto_suspend to 5 minutes
3: Ran few queries
4: Ran your query (which does not specify user_name or warehouse_name) meaning you are searching for history from all users.
SELECT *
FROM table(MY_DATABASE.information_schema.query_history(
end_time_range_start => dateadd(HOUR, -4, current_timestamp()),
current_timestamp()
));
5: Returned output of few 100 records.
6: Used additional where clause to check for data of my user which ran few queries before auto_suspend of Warehouse and it returned few records.
SELECT *
FROM table(MY_DATABASE.information_schema.query_history(
end_time_range_start => dateadd(HOUR, -4, current_timestamp()),
current_timestamp()
))
WHERE USER_NAME = 'ADITYA';
7: Waited for 10 minutes so that my warehouse is auto_suspended.
8: Repeat point 5 and point 6 and again it returned records as expected.

Related

How to find out Account usage by user > particular date and how much time he used particular query

I want to query from snow flake db as part of monitoring process, How much time a user using snowflakedb to execute his queries after particular date. The purpose of this is, to prevent users to running long queries.
Account usage history is some thing I wanted to know. I'm very new to snowflakedb.
Is there any way to query from the metadata ?

You can use Query history view for this requirement
There are many columns in this view you can see and use appropriately as per your requirement.
Example Query :
SELECT query_id,
query_text,
query_type,
session_id,
user_name,
warehouse_name,
start_time,
end_time,
total_elapsed_time,
compilation_time,
execution_time
FROM snowflake.account_usage.query_history
WHERE user_name = 'anyusername'
AND Cast (start_time AS DATE) >= 'yourdate in yyyy-mm-dd format'
AND total_elapsed_time > 600000 -- around 10 minutes in milliseconds or you can specify any number here
AND warehouse_name = 'your datawarehouse name'
ORDER BY execution_time DESC;

There is also a parameter called STATEMENT_TIMEOUT_IN_SECONDS to control long running queries. Set to the amount of time, in seconds, after which a running SQL statement (query, DDL, DML, etc.) is canceled by the system. Can be set for Account » User » Session; can also be set for individual warehouses. The default setting is 172800 (2 days).

how to identify "unused" Azure SQL database

We have approximately 2 dozen SQL Database in our Azure portal. I have been asked to evaluate these and see if any are no longer being used and can be dropped. I know how to query their original Quotas and current sizes but I am baffled as to how to query DTUs.
How can I query each DB and see when someone last logged into or initiated any queries?
Thank you!

The following query should give you an idea if the database has been used based on resource consumption over the last 7 days:
SELECT *
FROM sys.dm_db_resource_stats
WHERE --database_name = 'AdventureWorksLT' AND
end_time > DATEADD(day, -7, GETDATE())
ORDER BY end_time DESC;

JDBC Timeout with Oracle DB

Is it possible to find out which query was executing/waiting in Oracle db when JDBC timeout was issued for a session?
I checked the sessions and I can see P2TEXT = 'interrupt' and P3TEXT = 'timeout' and wait_class = 'System I/O' but sql_id is blank and p2 is 0.
Thanks in advance.

Use the Active Session History (ASH) data to find out what was running and when.
ASH data is based on sampling. Don't expect to find the exact query at the exact millisecond. But if there is a performance problem it should stick out in a query like this:
select username, sample_time, sql_id
from gv$active_session_history
join dba_users
on gv$active_session_history.user_id = dba_users.user_id
where sample_time between timestamp '2017-05-20 09:00:00' and timestamp '2017-05-21 09:05:00'
order by sample_time desc;
That view generally only contains data for the past day, depending on how busy the system is. If you need to go back further in time you may be able to use DBA_HIST_ACTIVE_SESS_HISTORY instead.

Minimizing performance lost in Access based pass through query that is notably slower than SSMS

I have a front end application which allows others to create elaborate SQL queries based on options selected. This SQL code is then used as a pass through query, and is displayed as part of a form.
The pass through queries created are exceptionally slow however. When ran in Access, they range from 3 minutes to 20 minutes. SQL Server Management Studio 2012 manages to complete identical queries in 45 seconds to 3 minutes. How can this time difference be minimized or eliminated?
tldr:
Time to run query in SSMS: 45 seconds
Time to run query in Access: 3 minutes
Time to open Access form containing query: 3 minutes
Structure behind the question
The queries are structured in the following manner
SELECT ...
FROM
(
SELECT ...
FROM
(
SELECT ... FROM Archive.YearXX
UNION
SELECT ... FROM Archive.YearXX
UNION
SELECT ... FROM Current.Year16
)
LEFT JOIN ReferenceData.Table ON...
)
WHERE ...
GROUP BY ...
ORDER BY ...
where the various options are able to change almost anything in the code, from which tables are pulled in the UNION segment, which fields are returned in the SELECT statement, what is included in the WHERE, GROUP BY, and ORDER BY clauses.
All SQL Servers are in the same local location, and the tables referenced range from 20 fields by 100,000 rows to 30 fields by 13,000,000 rows
Current connection string (actual data redacted):
ODBC;Driver=SQL Server;Server=[server name];DATABASE=[database];UID=[username];PWD=[password];
VBA code which creates QueryDef:
Set qdf = CurrentDb.CreateQueryDef("GeneratedReportData", GenSQL(False))
qdf.Connect = "ODBC;Driver=SQL Server;Server=[server name];DATABASE=[database];UID=[username];PWD=[password];"
qdf.ReturnsRecords = True
qdf.ODBCTimeout = 0
Constraints
I am unable to change the languages or software I am using.
There is little that can be done about the structure and use of subqueries.

SQL Server Query Performance with Timestamp and variable

I have a simple SQL query to count the number of telemetry records by clients within the last 24 hours.
With an index on TimeStamp, the following query runs in less than 1 seconds for about 10k rows
select MachineName,count(Message) from Telemetry where TimeStamp between DATEADD(HOUR,-24, getutcdate()) and getutcdate() group by MachineName
However, when I tried to making the hard-coded -24 configurable and added a variable, it took more than 5 min for the query to get executed.
DECLARE #cutoff int; SET #cutoff = 24
select MachineName,count(Message) from Telemetry where TimeStamp between DATEADD(HOUR, -1*#cutoff, getutcdate()) and getutcdate() group by MachineName
Is there any specific reason for the significant decrease of performance? What's the best way of adding a variable without impacting performance?

My guess is that you also have an index on MachineName - or that SQL is deciding that since it needs to group by MachineName, that would be a better way to access the records.
Updating statistics as suggested by AngularRat is a good start - but SQL often maintains those automatically. (In fact, the good performance when SQL knows the 24 hour interval in advance is evidence that the statistics are good...but when SQL doesn't know the size of the BETWEEN in advance, then it thinks other approaches might be a better idea).
Given:
CREATE TABLE Telemetry ( machineName sysname, message varchar(88), [timestamp] timestamp)
CREATE INDEX Telemetry_TS ON Telemetry([timestamp]);
First, try the OPTION (OPTIMIZE FOR ( #cutoff = 24 )); clause to let SQL know how to approach the query, and if that is insufficient then try WITH (Index( Telemetry_TS)). Using the INDEX hint is less desirable.
DECLARE #cutoff int = 24;
select MachineName,count(Message)
from Telemetry -- WITH (Index( Telemetry_TS))
where TimeStamp between DATEADD(HOUR, -1*#cutoff, getutcdate()) and getutcdate()
group by MachineName
OPTION (OPTIMIZE FOR ( #cutoff = 24 ));

Your parameter should actually work, but you MIGHT be seeing an issue where the database is using out of date statistics for the query plan. I'd try updating statistics for the table you are quering. Something like:
UPDATE STATISTICS TableName;
Additionally, if your code is running from within a stored procedure, you might want to recompile the procedure. Something like:
EXEC sp_recompile N'ProcedureName';
A lot of times when I have a query that seems like it should run a lot faster but isn't, it's a statistic/query plan out of date issue.
References:
https://msdn.microsoft.com/en-us/library/ms187348.aspx
https://msdn.microsoft.com/en-us/library/ms190439.aspx

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Snowflake query_history gets reset after warehouse suspension - snowflake-cloud-data-platform

Related

How to find out Account usage by user > particular date and how much time he used particular query

how to identify "unused" Azure SQL database

JDBC Timeout with Oracle DB

Minimizing performance lost in Access based pass through query that is notably slower than SSMS

SQL Server Query Performance with Timestamp and variable

Categories

Resources