How to get most frequently executed queries in snowflakes - snowflake-cloud-data-platform

Like to know is there a better way(cost efficient way) to get more frequently executed queries? in snowflakes along with its count? and run time for each query during each time the query was called.
I know we can use below to get few details as per the above requirement but not all .But here its using information_schema.query_history. I believe it adds up to the bill for each time we run this query unlike SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY.
Please help me with most cost effective solution as we like to run this more times during the initial days to identify and optimize the long running /more frequent queries.
select hash(query_text),
query_text,
count(*),
avg(compilation_time),
avg(execution_time)
from table( information_schema.query_history(
dateadd('hours', -1, current_timestamp() ),
current_timestamp() )
)
group by 1, 2
order by 3 desc;

Related

Query really slow when changing join field

I'm using SQL Server 2008 and I noticed an enormous difference in performance when running these two almost identical queries.
Fast query (takes less than a second):
SELECT Season.Description, sum( Sales.Value ) AS Value
FROM Seasons, Sales
WHERE Sales.Property05=Seasons.Season
GROUP BY Seasons.Description
Slow query (takes around 5 minutes):
SELECT Season.Description, sum( Sales.Value ) AS Value
FROM Seasons, Sales
WHERE Sales.Property04=Seasons.Season
GROUP BY Seasons.Description
The only difference is that the tables SALES and SEASONS are joined on Property05 in the fast query and Property04 in the slow one.
Neither of the two property fields are in a key nor in an index so I really don't understand why the execution plan and the performances are so different between the two queries.
Can somebody enlighten me?
EDIT: The query is automatically generated by a Business Intelligence program, so I have no power there. I would have normally used the JOIN ON sintax, although I don't know if that makes a difference.
Slow query plan: https://www.brentozar.com/pastetheplan/?id=HkcBc7gXZ
Fast query plan: https://www.brentozar.com/pastetheplan/?id=rJQ95mgXb
Note that the query above were simplified to the essential part. The query plans are more detailed.

Improve reporting stored procedure execution time - tuning temporary tables?

I've been tasked with improving the performance (and this is my first real-world performance tuning taks) of a reporting stored procedure which is called by an SSRS front-end and the stored procedure currently takes about 30 seconds to run on the largest amount of data (based on filters set from the report frontend).
This stored procedure has a breakdown of 19 queries executing in it, most of which are transforming the data from an initial (legacy) format from inside the base tables into a meaningful dataset to be displayed to the business side.
I've created a query based on a few DMV's in order to find out which are the most resource-consuming queries from the stored procedure (small snippet below) and I have found one query which takes about 10 seconds, in average, to complete.
select
object_name(st.objectid) [Procedure Name]
, dense_rank() over (partition by st.objectid order by qs.last_elapsed_time desc) [rank-execution time]
, dense_rank() over (partition by st.objectid order by qs.last_logical_reads desc) [rank-logical reads]
, dense_rank() over (partition by st.objectid order by qs.last_worker_time desc) [rank-worker (CPU) time]
, dense_rank() over (partition by st.objectid order by qs.last_logical_writes desc) [rank-logical write]
...
from sys.dm_exec_query_stats as qs
cross apply sys.dm_exec_sql_text (qs.sql_handle) as st
cross apply sys.dm_exec_text_query_plan (qs.plan_handle, qs.statement_start_offset, qs.statement_end_offset) as qp
where st.objectid in ( object_id('SuperDooperReportingProcedure') )
, [rank-execution time]
, [rank-logical reads]
, [rank-worker (CPU) time]
, [rank-logical write] desc
Now, this query is a bit strange in the sense that the execution plan shows that shows that the bulk of the work (~80%) is done when inserting the data into the local temporary table and not when interrogating the other tables from which the source data is taken and then manipulated. (screenshot below is from SQL Sentry Plan Explorer)
Also, in terms of row estimates, the execution plan has way off estimates for this, in the sense that there are only 4218 rows inserted into the local temporary table as opposed to the ~248k rows that the execution plan thinks its moving into the local temporary table. So, becasue of this, I'm thinking "statistics", but still do those even matter if ~80% of the work is the actual insert into the table?
One of my first recommendations was to re-write the entire process and the stored procedure so as to not include the moving and transforming of the data into the reporting stored procedure and to do the data transformation nightly into some persisted tables (real-time data is not required, only relevant data until end of previous day). But the business side does not want to invest time and resources into redesigning this and instead "suggests" I do performance tuning in the sense of finding where and what indexes I can add to speed this up.
I don't believe that adding indexes to base tables will improve the performance of the report since most of the time needed for running the query is saving the data into a temporary table (which from my knowledge it will hit tempdb, which means that they will be written to disk -> increased time due to I/O latency).
But, even so, as I've mentioned this is my first performance tuning task and I've tried to read as much as possible related to this in the last couple of days and these are my conclusions so far, but I'd like to ask for advice from a broader audience and hopefully get a few more insights and understanding on what I can do to improve this procedure.
As a few clear questions I'd appreciate if could be answered are:
Is there anything incorrect in what I have said above (in my understanding of the db or my assumptions) ?
Is it true that adding an index to a temporary table will actually increase the time of execution, since the table (and its associated index(es) is/are being rebuilt on each execution)?
Could there anything else be done in this scenario without having to re-write the procedure / queries and only be done via indexes or other tuning methods? (I've read a few article headlines that you could also "tune tempdb", but I didn't get into the details of those, yet).
Any help is very much appreciated and if you need more details I'll be happy to post.
Update (2 Aug 2016):
The query in question is (partially) below. What is missing are a few more aggregate columns and their corresponding lines in the GROUP BY section:
select
b.ProgramName
,b.Region
,case when b.AM IS null and b.ProgramName IS not null
then 'Unassigned'
else b.AM
end as AM
,rtrim(ltrim(b.Store)) Store
,trd.Store_ID
,b.appliesToPeriod
,isnull(trd.countLeadActual,0) as Actual
,isnull(sum(case when b.budgetType = 0 and b.budgetMonth between #start_date and #end_date then b.budgetValue else 0 end),0) as Budget
,isnull(sum(case when b.budgetType = 0 and b.budgetMonth between #start_date and #end_date and (trd.considerMe = -1 or b.StoreID < 0) then b.budgetValue else 0 end),0) as CleanBudget
...
into #SalvesVsBudgets
from #StoresBudgets b
left join #temp_report_data trd on trd.store_ID = b.StoreID and trd.newSourceID = b.ProgramID
where (b.StoreDivision is not null or (b.StoreDivision is null and b.ProgramName = 'NewProgram'))
group by
b.ProgramName
,b.Region
,case when b.AM IS null and b.ProgramName IS not null
then 'Unassigned'
else b.AM
end
,rtrim(ltrim(b.Store))
,trd.Store_ID
,b.appliesToPeriod
,isnull(trd.countLeadActual,0)
I'm not sure if this is actually helpful, but since #kcung requested it, I added the information.
Also, to answer some his questions:
the temporary tables have no indexes on them
RAM size: 32 GB
Update (3 Aug 2016):
I have tried #kcung's suggestions to move the CASE statements from the aggregate-generating query and unfortunately, overall, the procedure time has not improved, noticeably, as it still fluctuates in the range of ±0.25 to ±1.0 second (yes, both lower and higher time than the original version of the stored procedure - but I'm guessing this is due to variable workload on my machine).
The execution plan for the same query, but modified to remove the CASE conditions, leaving only the SUM aggregates, is now:
Adding indexes to the temporary table will definitely improve the read call but slows down the write calls to the temporary table.
Here, as you mentioned, there are 19 queries executing in the procedure, so analyzing only one query with execution plan would not be more helpful.
Adding more, if possible, execute this query only & check how much time it takes (rows affected).
Other approach you may try, not sure if possible in your case, try using table variable instead of temporary table. This is because, using table variable over the temporary table has additional advantages such as, procedure is pre-compiled, no transactional logs are maintained. & more, you don't need to write drop table.
Any chance I can see the query ? and the indexes on both tables ?
How big is your ram ? how big is the row in each table(roughly) ?
Can you update statistics for both table and resend the query planner ?
To answer your question :
You're mostly right, except in the part of adding indexes. Adding indexes will help the query to do lookup. It will also give chance to the query planner to consider nested loop join plan instead of the hash join plan. Unfortunately, I can't answer more until my question being answered.
You shouldn't need to add index to the temp table. Adding index to this temp(or any insert destination table) table will increase write time, because the insert will need to update that index. Just imagine an index as copy of your table with less information and it sits on top of your table and it needs to be in sync with your table. Every write (insert, update, delete) needs to update this index.
Looking at both tables total rows, this query should run way faster than 10s, unless you have a lemon PC, then it's a different story.
EDIT:
Just want to point out for point 2, I didn't realise you're source table is temp table as well. Temporary table is destroyed after each session of a connection ended. Adding index to temporary table means that you will add extra time to create this index everytime you create this temporary table.
EDIT:
Sorry, I'm using phone now. I'm just gonna be short.
So essentially 2 things :
add primary key on temp table creation time so you do it in one go. Don't bother with adding nonclustered index or any covering index you will end up spending more time creating those.
see your query, all of the case when statement, instead of doing it in this query, why don't you add them as another column in the table. Essentially you want to avoid calculation on the fly when doing group by. You can leave the sum() in the query as it's an aggregate query, but try and reduce run time calculation as much as possible.
Sample :
case when b.AM IS null and b.ProgramName IS not null
then 'Unassigned'
else b.AM
end as AM
You can create a column named AM when creating table b.
Also those rtrim and ltrim. Please remove those and stick it in table creation time. :)
One suggestion is to increase the execution time of stored procedure.
cmd.CommandTimeout = 200 // in seconds.
You can also generate a report link and email it to user when the report was generated.
Other than that use CTE never use temp tables as they are more expensive.

MSSQL performance issue when using nested query in where clause

MSSQL is doing something I don't understand, and I hope to find an answer here.
I have a small query that uses 2 sub-queries in the where clause:
where TerminatedDateTime between #startdate and #enddate
and Workgroup in (select distinct Workgroup from #grouping)
and Skills in (select Skills from #grouping)
The query runs fine, but when I look at the execution plan is see the following:
http://i.stack.imgur.com/ogkRP.png
The query
select distinct Workgroup from #grouping
has one result: "workgroup1"
The result of the query has 541 rows, but it still fetches all the rows within the date selection. if I remove the workgroup and skill part, the amount of rows is the same.
The filtering is done in the hash match.
If I enter the name where the select query is, I see the following:
where TerminatedDateTime between #startdate and #enddate
and Workgroup in ('workgroup1')
and Skills in (select Skills from #grouping)
http://i.stack.imgur.com/Ydq6C.png
Here it selects the correct number of rows and the query runs much better.
Why is this, and is there a way to run the query with the sub-query and make it select only the relevant rows from the view?
I have tried it with an inner join on the #grouping table, but with the same results, it selects to much rows.
I'm not sure why you need distinct in (select distinct Workgroup from #grouping).
The problem here is that the estimates are off. Without seeing the whole query and the execution plan XML, I'd suggest to try these alternatives:
select workgroup and skills into a #temp table and join to it
add option(recompile) to the statement
Each one should be a solution by itself.
It would be benefitial to see the execution plan XML anyway.
EDIT (after reviewing the execution plan, thx for making it available):
This query is over a partitioned view. With check constraints in place, we can see that the partition elimination was done properly, according to runtime value of #startdate and #enddate parameters.
Why optimizer produced different execution plans for the first (one with the subquery) and the second (one with scalar) query?
As far as the optimizer is concerned, it's just a coincidence that the subquery produced only one row. It has to create an execution plan which will be valid for any output from the subquery, be it no rows, one or many.
OTOH, when you specify a scalar value, then optimizer is free to make more straight-forward decisions.
Working with a partitioned view made optimizer's job more difficult, hence my original recommedations showed useless.
Yes, optimizer could probably do a better job here. BTW, are workgroups and skills correlated in any way?

SQL Server : wrong index is used when filter value exceeds the index histogram range

We have a very large table, where every day 1-2 million rows are being added to the table.
In this query:
SELECT jobid, exitstatus
FROM jobsData
WHERE finishtime >= {ts '2012-10-04 03:19:26'} AND task = 't1_345345_454'
GROUP BY jobid, exitstatus
Indexes exists for both Task and FinishTime.
We expected that the task index will be used since it has much fewer rows. The problem that we see is that SQL Server creates a bad query execution plan which uses the FinishTime index instead of the task, and the query takes very long time.
This happens when the finish time value is outside the FinishTime index histogram.
Statistics are updated every day / several hours, but there are still many cases where the queries are for recent values.
The question: we can see clearly in the estimated execution plan that the estimated number of rows for the FinishTime is 1 in this case, so the FinishTime index is selcted. Why SQL Server assumes that this is 1 if there is no data? Is there a way to tell it to use something more reasonable?
When we replace the date with a bit earlier one, statistics exists in the histogram and the estimated number of rows is ~7000
You can use a Plan Guide to instruct the optimizer to use a specific query plan for you. This fits well for generated queries that you cannot modify to add hints.

SQL Server last 25 records query optimization

I have 4million records in one of my tables. I need to get the last 25 records that have been added in the last 1 week.
This is how my current query looks
SELECT TOP(25) [t].[EId],
[t].[DateCreated],
[t].[Message]
FROM [dbo].[tblEvent] AS [t]
WHERE ( [t].[DateCreated] >= Dateadd(DAY, Datediff(DAY, 0, Getdate()) - 7, 0)
AND [t].[EId] = 1 )
ORDER BY [t].[DateCreated] DESC
Now I do not have any indexes running for this table and do not intend to have one. This query takes about 10-15 seconds to run and my apps times-out, now is there a way to better it?
You should create an index on EId, DateCreated or at least DateCreated
Without this only way of optimising this that I can think of would be to maintain the last 25 in a separate table via an insert trigger (and possibly update and delete triggers as well).
If you have an ID in the table that is autoincrement (not the Eid but a separate PK) you can order by ID desc instead of DateCreated, that might make your order by faster.
otherwise you do need an index (but your question says you do not want that).
If the table has no indexes to support the query you are going to be forced to perform a table scan.
You are going to struggle to get around the table scan aspect of that - and as the table grows, the response time will get slower.
You are going to have to endevour to educate your client as to the problems going forward they face, and that they should consider an index. They may be saying no, you need to show the evidence to support the reasoning, show them times with / without, and make sure the impact to the record insertion is also shown, it's a relatively simple cost / benefit / detriment for the adding of the index / not adding of it. If they insist on no index, then you have no choice but to extend your timeouts.
You should also try query hint:
http://msdn.microsoft.com/en-us/library/ms181714.aspx
With option FAST n -- number of rows.

Resources