How can I find DDL statements run against Database/schemas - snowflake-cloud-data-platform

I have specific requirement where I need to find only DDL ran against my snowflake database/schema during specific period of time , how can we find using query_history or any other method, any idea??

It is possible using QUERY_HISTORY:
SELECT *
FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY())
-- END_TIME_RANGE_START/END_TIME_RANGE_END to get specific time range
WHERE DATABASE_NAME ILIKE '<db_name>'
AND SCHEMA_NAME ILIKE '<schema_name>'
AND QUERY_TYPE ILIKE ANY ('CREATE%', 'ALTER%', 'DROP%', 'DESCRIBE%');
The QUERY_TYPE column contains values that are more descriptive like: CREATE_VIEW/CREATE_TABLE_AS_SELECT/CRATE_TABLE/ALTER_TABLE_ADD_COLUMN etc.
To retrieve entire class of DDL commands, wildcard pattern was used CREATE% or ALTER%. It could be further tweaked depending on specific needs.

Related

Is there a query to use in Snowflake to retrieve the original SELECT statement behind the creation of a table?

I am trying to find the original code behind the creation of a table in Snowflake.
I am using this query:
SELECT GET_DDL('table', 'table1');
This is only giving me the original DDL behind the table. I would need the full code (as in the original SQL SELECT statement).
Anyone know what query could get me that?
You can query QUERY_HISTORY and get the SQL statement (and other data) using the following:
// Be sure to use a role with permission to perform the following
SELECT
*
FROM
SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE
QUERY_TEXT ILIKE '%create%table%table1%'
ORDER BY END_TIME DESC
LIMIT 20;

How to get original DDL from table created in SNWOFLAKE environment?

I am attempting to retrieve the original SQL (or DDL) behind 2 tables I created months ago in SNOWFLAKE. Does anyone know the query I would need to retrive this original DDL?
Using QUERY_HISTORY to find the actual query:
SELECT qh.QUERY_TEXT, *
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY qh
WHERE qh.QUERY_TEXT ILIKE '%CREATE%TABLE%<table_name_here>%';
Alternatively current definition:
SELECT GET_DDL('table', '<table_name_here>');

Get a list of tables involved in a process

I'd like to run a series of queries (a couple hundred ETL statements) and get a list of which tables are selected from. Is there a way to do this in snowflake? I was wondering if I could set my connection to a certain role/warehouse and pare the information down that way or some such, but am not sure what clever ways there might be to get this information.
Thank you kindly!
To obtain the SELECT statements from your ETLs:
At the start of your ETL, set the QUERY_TAG or save the SESSION_ID:
alter session set query_tag='MY_ETL'; -- Tag queries
select current_session(); -- Or save this SESSION_ID
Then filter history by QUERY_TAG:
select * from table(information_schema.query_history());
select query_text from table(result_scan(-1))
where query_type='SELECT' and query_tag='MY_ETL'
order by start_time;
or by SESSION_ID:
select * from table(information_schema.query_history_by_session(session_id=>298348393433));
select query_text from table(result_scan(-1))
where query_type='SELECT'
order by start_time;
;
To get the list of tables and other objects, you could then execute EXPLAIN for each SELECT statement returned above, and check the OBJECTS column. (This has caveats -- fore example, it's based on the logical plan, not actual execution.)
If that's too heavy, a trick is to inject metadata, like table names, into comments:
select /* metadata here */ 1;
Then extract the metadata from the QUERY_TEXT:
select * from table(information_schema.query_history());
select regexp_substr(query_text, '/\\*(.*?)\\*/', 1, 1, 'e') metadata, *
from table(result_scan(-1))
where query_type='SELECT' and query_tag='MY_ETL'
order by start_time desc;
But this will miss tables buried in views and functions.
Hope that's helpful

Can I look up the query id from a SnowSql session in the Snowflake Interface?

Is it possible to do the same thing using the SnowSql command line interface (CLI)? I'd like to view the SQL code for a particular query, as specified by its query ID, using the CLI.
When using the web console, one may go on the History tab and filter by "Query ID" e.g. "xxx-xxxxx" to view the SQL code and error messages (if any) for that particular query.
You can use LAST_QUERY_ID to retrieve the query IDs for queries in your session.
select last_query_id(); Gets the most Recent Query ID
select last_query_id(1); Gets the first Query ID of the session
select last_query_id(-2); Gets the Query ID from two queries ago.
etc.
Then you can use a query like this to get your actual Query Text if you need it.
SELECT QUERY_TEXT
FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY_BY_SESSION(RESULT_LIMIT => 10))
WHERE QUERY_ID = '018cde0c-0077-09ee-0000-001812d26346'
;
If you need to retrieve query information outside of your Session, I believe you can use ACCOUNT_USAGE if that works for you.
SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY WHERE QUERY_ID = '018cde0c-0077-09ee-0000-001812d26346';
Please try this using QUERY_HISTORY* Tables:
SELECT query_id, query_text,user_name,session_id,role_name,warehouse_name,
execution_status,error_code,error_message,start_time,end_time
FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY(DATEADD('HOURS',-1,CURRENT_TIMESTAMP()),CURRENT_TIMESTAMP()))
ORDER BY start_time;
Reference : https://docs.snowflake.net/manuals/sql-reference/functions/query_history.html

String column Search/Replace GUIDs

i have a SQL Profiler trace saved to a table in SQL Server.
i want to perform sum/avg/count analysis of CPU/Reads/Duration on the queries in the trace. But most of the profiler data records calls to stored procedures with uniqueidentifer parameter(s):
EXECUTE GetTransactionCounts #BankGUID = '{231281D7-F6C2-4EAE-98AE-E9196D8016F0}', #SessionGUID='{7F34361F-CEEA-4CEA-8CBD-2704FFE92DEF}'
SELECT SUM(Total) AS Total FROM fn_BalancingAdditionsUS('{C08961DB-0B6A-4E67-A82B-5BBBA0A84A74}')
EXEC CreateCloser '{7F34361F-CEEA-4CEA-8CBD-2704FFE92DEF}', NULL , '{08E74DBB-3BC4-49A7-AA10-95AA6BD24784}'
EXECUTE GetMachineImpressmentForSession #SessionGUID = '{446881BA-1439-4AD8-B33B-C784120EFBA2}'
SELECT SUM(Total) AS Total FROM fn_BalancingAdditionsCanadian('{446881BA-1439-4AD8-B33B-C784120EFBA2}')
SELECT SUM(Total) AS Total FROM fn_BalancingSubtractionsUS('{446881BA-1439-4AD8-B33B-C784120EFBA2}')
So when i try to aggregate the profiler trace data to find the worst performing queries:
SELECT
Description,
COUNT(*) AS EventCount,
AVG(CPU) AS CPU, SUM(CPU) AS CpuTotal,
AVG(Reads) AS Reads, SUM(Reads) AS ReadsTotal,
AVG(Duration) AS Duration, SUM(Duration) AS DurationTotal
FROM SlowQueriesTrace
GROUP BY Description
then no aggregation occurs, because every GUID is unique. What i need is some way to replace the uniqueidentifier parameters with a generic %g marker:
EXECUTE GetTransactionCounts #BankGUID = %g, #SessionGUID=%g
SELECT SUM(Total) AS Total FROM fn_BalancingAdditionsUS(%g)
EXEC CreateCloser %g, NULL , %g
EXECUTE GetMachineImpressmentForSession #SessionGUID = %g
SELECT SUM(Total) AS Total FROM fn_BalancingAdditionsCanadian(%g)
SELECT SUM(Total) AS Total FROM fn_BalancingSubtractionsUS(%g)
Then my aggregation will work.
Aside from exporting the table to Excel and hand editing all 10,270 events, can anything think of any way to perform GUID search & replace pattern matching inside SQL Server?
Other hacks i tried:
Trim description to first 40 characters (i.e. CAST(description AS varchar(40))):
EXECUTE GetTransactionCounts #BankGUID =
SELECT SUM(Total) AS Total FROM fn_Balan
EXEC CreateCloser '{7F34361F-CEEA-4CEA-8
EXECUTE GetMachineImpressmentForSession
SELECT SUM(Total) AS Total FROM fn_Balan
SELECT SUM(Total) AS Total FROM fn_Balan
Except that merges items that shouldn't be merged, and other items that should be merged are not.
Use SoundEx:
E223
S423
E220
E223
S423
Except that you can see lines that are completely different are given the same soundex. Also i am unable to determine what query S338 corresponds to.
The hack i ended up using was to create a new Category column, initally null. i then spent two hours with carefully selected LIKE clauses to pick out a particular query and then "tag" them all with the query. e.g.:
UPDATE QueryTrace
SET Category = 'EXECUTE GetTransactionCounts #BankGUID ='
WHERE Description LIKE 'EXECUTE GetTransactionCounts #BankGUID =%'
and
UPDATE QueryTrace
SET Category = 'SELECT SUM(Total) AS Total FROM fn_BalancingAdditionsCanadian'
WHERE Description LIKE '%FROM fn_BalancingAdditionsCanadian%'
That doesn't mean i don't need a solution using this question.
Have you tried using ClearTrace which performs certain query parameterisations/normalisations?
Another option is to use a CLR function: Determining Poorly Performing Queries for Tuning from SQL Server Workload Trace Files
Whenever you gather workload traces to
identify poorly performing queries,
you need to import this data into a
database table, and to "normalise" and
aggregate this information to identify
the worst offenders. This can be done
in a variety of ways. One way is to
define a regular expression such as
this SQL CLR method based on work done
by Itzik Ben-Gan and modified by Adam
Machanic:
[Microsoft.SqlServer.Server.SqlFunction(IsDeterministic = true)]
public static SqlString sqlsig(SqlString querystring)
{
return (SqlString)Regex.Replace(
querystring.Value,
#"([\s,(=<>!](?![^\]]+[\]]))(?:(?:(?:(?:(?# expression coming
)(?:([N])?(')(?:[^']'')*('))(?# character
)(?:0x[\da-fA-F]*)(?# binary
)(?:[-+]?(?:(?:[\d]*\.[\d]*[\d]+)(?# precise number
)(?:[eE]?[\d]*)))(?# imprecise number
)(?:[~]?[-+]?(?:[\d]+))(?# integer
)(?:[nN][uU][lL][lL])(?# null
))(?:[\s]?[\+\-\*\/\%\&\\^][\s]?)?)+(?# operators
)))",
#"$1$2$3#$4");
}
Edit by OP: i had not heard of ClearTrace. i tried it:
Edit: Did you use the right trace template to gather the trace?

Resources