I am exploring the different ways that can monitor information programmatically. For this specific use case, if any data spills into the SSD, I want the sized of the warehouse to be increased for the next set of queries.
I have already explored:
The Maximized vs Auto-Scale
QUERY_HISTORY - I was unable to find the equivalent to the query profiler
Is it possible to access information on SSD spillage on a specific query id?
There is not currently a spillage metric stored in query_history, which makes what you are attempting complicated to do programmatically. Until they add that metric to the query_history, you'll have to size warehouses a bit less scientifically. What I have found to be useful, however, is to leverage bytes_scanned from query_history as a metric to determine whether the warehouse is performing the scans optimally. Do some research on your queries to determine what the range of bytes_scanned to spillage is for each warehouse size, and then leverage that.
Related
In AWS Redshift we can manage query priority using WLM. Do we have any such feature for Snowflake or is it done using multi warehouse strategy?
I think you've got the right idea that warehouses are typically the best approach this problem in Snowflake.
If you have a high priority query/process/account, it's entirely reasonable to provide that with a dedicated warehouse. That will guarantee that your query won't be competing with any resources on other warehouses.
You can also then size that warehouse appropriately. if it's a Small query, or file copy query, for example, and it's just really important that it runs right away, then you can give it a dedicated Small/X-Small warehouse. If it's a big query that doesn't run very frequently, you can give it a larger warehouse. If you set it to auto-suspend then you won't even incur much extra cost for the extra dedicated compute.
I know multi-cluster warehouses can have an autoscaling policy to scale out, but is there a way to automate resizing up or down? I have a set of queries that deal with varying sizes of data, which means I sometimes only need a S warehouse, but sometimes need a XL. I don't think Snowflake provides a built-in mechanism to do this, so looking for advice on how to automate this, maybe with a SP?
You can use the ALTER WAREHOUSE DDLs to do what you describe and CALL a stored proc prior to your queries.
Another alternative is to create a warehouse of each size, then do USE WAREHOUSE <foo> prior to your query, which should wake it up, run the query, then suspend once its inactive (although would come with the disadvantage of not being able to reuse locally cached data.)
Currently we have a Datawarehouse that is holding data from multiple tenants. SQL server is on version 2019. Same schema for all the tenant databases and the data from all the tenants is consolidated in the Datawarehouse. Data is partitioned in the datawarehouse on Tenant basis. We have parameter sniffing problem with the new dashboard as the data varies a lot between the tenants. Some tenants have data less than 10000 rows and a couple of tenants have data ranging up to 5 million rows. Due to this, dashboard performance is bad for large tenants if the execution plan is built based on a smaller tenant.
Suggestions on the internet are available asking to use Recompile hint or Optimize for hint etc. But I have a doubt on the basics of this parameter sniffing. As statistics are maintained by the SQL server at partition level, is this statistics information not used to see if the plan built is right for a new run time value? Before executing, are stats ever compared for the plans built on compile time and run time to see if they are valid and the associated plan is valid?
Kindly advise.
Embed the Partition number or the TenantID key in the query text
Parameters are for when you want shared, reused query plans. Hard-coding the criteria that cause query plans to vary is the basic right answer here.
And even though "As much as possible, we are refraining from using Dynamic SQL in the code", you should make an exception here.
Use OPTION RECOMPILE
If you don't end up spending too much time in query optimization, this is almost as good. Or
Add a comment into the query that varies by tenant or tenant size to get a partitioned plan cache. This is also useful for correlating queries to the code paths that generate them. eg
/* Dashboard: Sales Overview
Visual: Total Sales
TenantID: 12345 */
select sum(Sales) TotalSales
from T
where TenantId = #tenantId
I want to find those queries in MS Access which are utilizing the CPU mostly and put them in a table in descending order.
I have checked the system tables of MS Access database, but can't find any clue for this.
I am new to MS Access, please help.
For 10 or even 15 years, Access has never been CPU bound. In other words, network speeds, disk drive speeds etc. are the main factor .
In the vast majority of cases throwing more CPU at a problem will not help improve performance. If 99% of time is network or other factors, then a double of CPU will only improve by 2%.
However, I will accept that if a query is using lots of CPU, then it stands somewhat to reason that such a query is pulling a lot of data. There is no CPU logger for the Access database engine. However, you can look at rows and the query plan, and that can be done with showplan. How this works and can be used is outlined here:
How to get query plans (showplan.out) from Access 2010?
And here is a older article on showplan and how to use it:
https://www.techrepublic.com/article/use-microsoft-jets-showplan-to-write-more-efficient-queries/#
So, showplan is somewhat similar to looking at the query plan used in SQL server. It will tell you things like if a full table scan is being used to get one row, or if indexing can or was used. So, looking at the query plan, be it sql server, or in this case Access is certainly possible. However, the status on CPU usage are slim, but how much data and things like if the query plan is doing full table scans is available in a similar fashion to query plans like one would see for server based systems such as SQL server.
I recently decided to crawl over the indexes on one of our most heavily used databases to see which were suboptimal. I generated the built-in Index Usage Statistics report from SSMS, and it's showing me a great deal of information that I'm unsure how to understand.
I found an article at Carpe Datum about the report, but it doesn't tell me much more than I could assume from the column titles.
In particular, the report differentiates between User activity and system activity, and I'm unsure what qualifies as each type of activity.
I assume that any query that uses a given index increases the '# of user X' columns. But what increases the system columns? building statistics?
Is there anything that depends on the user or role(s) of a user that's running the query?
But what increases the system columns?
building statistics?
SQL Server maintains statistics on an index (it's controlled by an option called "Auto Update Statistics", by default it's enabled.) Also, sometimes an index grows or is reorganized on disk. Those things come in under System Activity.
Is there anything that depends on the
user or role(s) of a user that's
running the query?
You could look into using SQL Server Profiler to gather data about which users use which indexes. It allows you to save traces as a table. If you can include index usage in the trace, you could correlate it with users. I'm sure the "showplan" would include it, but that's rather coarse.
This article describes a way to collect a trace, run it through the index tuning wizard, and analyze the result.