How does warehouse size change automatically in Snowflake? - snowflake-cloud-data-platform

I have a small warehouse in SnowFlake, with minimum clusters = 1, maximum clusters = 5 and scaling policy set to standard. However, when I was viewing the query history profile, I saw that for some of the queries, size column was set to large, but the cluster number remained 1.
Now, I know that autoscaling helps increasing number of clusters, but how did the warehouse size change for some queries without manual intervention?
I referred to the official documentation of SnowFlake here, but couldn't find any ways to automatically change size of warehouse.

Snowflake does not carry any feature that will automatically alter the size of your warehouse.
It is likely that the tools in use (or users) may have run an ALTER WAREHOUSE SET WAREHOUSE_SIZE=LARGE. The purpose may have been to prepare for a larger operation, ensuring adequate performance temporarily.
Use the various history views to find out who/what and when such a change was run. For example, the QUERY_HISTORY view could be useful in finding the username and role that was used to alter the warehouse size, with the following query:
SELECT DISTINCT user_name, role_name, query_text, session_id, start_time
FROM snowflake.account_usage.query_history
WHERE query_text ILIKE 'ALTER%SET%WAREHOUSE_SIZE%=%LARGE%'
AND start_time > CURRENT_TIMESTAMP() - INTERVAL '7 days';
Then you could use LOGIN_HISTORY view to find which IP the user authenticated from during the time (or use the history UI for precise client information), check all other queries executed in the same session, etc.
To prevent unauthorized users from modifying warehouse sizes, consider restricting warehouse-level grants on their roles (rolename in use can be detected by the query above).

Related

Does Snowflake charge me for queries with WAREHOUSE_SIZE=NULL?

I am trying to reduce the costs of our queries from Snowflake. In the docs, there is a detailed explanation about the credits per second by the size of the used warehouse. When I look at the history tab in the web console, or at the 'snowflake.account_usage.query_history' view, I see that some of the queries have NULL value in the WAREHOUSE_SIZE column (commit statements, desc table etc.). Does anyone know how this type of queries is charged? maybe they are free?
This doesn't seem to be mentioned anywhere in the docs.
The result of any query is available for the following 24 hours through the Service Layer in the Result Cache. Therefore, queries that appear to have run without any Warehouse, do Not effectively use any Warehouse.
However, it doesn't necessarily mean that the Warehouse it was supposed to be used otherwise, was Not running at that time.
Say, for example, another query 'Q1' was executed within "your" warehouse 'MyWH' just before you ran your 'Q2': while yours will hit the cache without needing 'MyWH' running, 'Q1' will still cause 'MyWH' to resume and therefore consume credits.
More details on Snowflake Caching are available here: https://community.snowflake.com/s/article/Caching-in-Snowflake-Data-Warehouse
Queries without a warehouse listed are not charged via the compute credits. These types of queries are using the "cloud services" layer in Snowflake and are charged differently.
Query to determine how much and if cloud services are being used.
use schema snowflake.account_usage;
select query_type, sum(credits_used_cloud_services) cs_credits, count(1) num_queries
from query_history
where true
and start_time >= timestampadd(day, -1, current_timestamp)
group by 1
order by 2 desc
limit 10;

Result Cache Access in Snowflake Cloud Data Warehouse

Snowflake offers various cache options and one of them is result cache. I understand that other users can use query result cache to access the result of repeated query (executed within 24 hours) but should they be in the same role OR users with all roles can access the cache results?
If the tool behavior has recently changed, what holds correct for Snowpro certification exam?
But should they be in the same role OR users with all roles can access the cache results?
The result cache should come into play well after access-authorization checks, so the role-membership of the authenticated user or service does not matter.
The cache is permitted for read in the same manner the table (or view) results are, and the Snowflake architecture separates the storage and cache from the compute layers, so the only rules that matter on its use are the ones defined in the documentation.
Typically, query results are reused if all of the following conditions are met:
The user executing the query has the necessary access privileges for
all the tables used in the query.
The new query syntactically matches the previously-executed query. T
The table data contributing to the query result has not changed.
The persisted result for the previous query is still available.
Any configuration options that affect how the result was produced have
not changed.
The query does not include functions that must be evaluated at
execution (e.g. CURRENT_TIMESTAMP()).
The table’s micro-partitions have not changed (e.g. been re-
clustered or consolidated) due to changes to other data in the table.
Ref : https://community.snowflake.com/s/article/Understanding-Result-Caching
I understood that the role accessing the cached results has the required privileges -
-- If the query was a SELECT query, the role executing the query must have the necessary access privileges for all the tables used in the cached query.
-- If the query was a SHOW query, the role executing the query must match the role that generated the cached results.

Snowflake: How can I find out which internal table stages consumes most storage?

In my snowflake account I can see that there is a lot of storage used by stages. I can see this for example using the following query:
select *
from table(information_schema.stage_storage_usage_history(dateadd('days',-10,current_date()),current_date()));
There are no named stages in the databases. All used storage must be in internal stages.
How can I find out which internal stages consumes most storage?
If I know the name of the table I can list all files in the table stage using something like this:
list #SCHEMANAME.%TABLENAME;
My problem is that there are hundreds of tables in the databases and I have no idea which tables to query.
There is an ACCOUNT_USAGE view called STAGE_STORAGE_USAGE_HISTORY in the Snowflake database/share that will give you everything, including internal stages. I would use this over the information_schema view, since that is limited to what your role currently has access to.
https://docs.snowflake.com/en/sql-reference/account-usage/stage_storage_usage_history.html
You can use the view STAGES in Information schema or account usage to get the stages. Do note that Account usage has higher retention period than information schema and data retrieval is faster. You can read more here
If I understood correctly,you want to do something with stages to reduce the overall billing or storage size
Snowflake Stages
'Internal Named' and 'External' stages :
These are the only stages which can be altered or dropped and controlled by user
User Stage and Table stages are one which can not be altered or dropped,
managed by snowflake
So even if you could identify those Table and User specific stages , you can not drop those.
Snowflake Storage consumption includes below three components for billing
1. Databases size
2. Stages size
3. Fail Safe size
The size of the storage occupied can be visible (Only when you have Accountadmin role
or MONITOR privs) under below location webUI tab
Account Tab ---> Usage --> Average Storage Used
Note : on the account tab, No DB object wise storage billing details available
So how you can see the storage consumption of the associated tables (including their fail safe and time travel bit) and stage details
select * from <DB Name>."INFORMATION_SCHEMA"."TABLE_STORAGE_METRICS"
select * from <DB Name>."INFORMATION_SCHEMA"."STAGES"
Hope clarification helped
Thanks
Palash Chatterjee

How to find the count of total connections in snowflakes

We know that we have "show transactions" to see the transactions currently connected to database.
But I am interested
- To get the count of active users for each warehouse?
-History of connections count for each warehouse?
Is there a way to get above information using the sql commands (not the web ui)
If I understood correctly, you want to see the warehouse and active user mapping. There is no direct views as per my knowledge but you can leverage provided query where by keeping warehouse size !='0' you can tied warehouse and user together. You can check the below link
https://docs.snowflake.com/en/sql-reference/account-usage/query_history.html
Before that
Snowflake Sessions are not tagged with user name or account , those are system
generated ID.
User and warehouse relationship is zero or many (An active user can use multiple warehouse in parallel , also a warehouse can be used by multiple users at same point of time)
A user can have active session without a running warehouse
It is not mandatory to have an active user to keep your warehouse running
Finally, queries can also be executed without turning the warehouse up
SELECT TO_CHAR(DATE_TRUNC('minute', query_history.START_TIME ),'YYYY-MM-DD
HH24:MI') AS "query_history.start_time",
query_history.WAREHOUSE_NAME AS "query_history.warehouse_name",
query_history.USER_NAME AS "query_history.user_name"
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY AS query_history
WHERE (query_history.WAREHOUSE_SIZE != '0')
GROUP BY DATE_TRUNC('minute', query_history.START_TIME ),2,3
ORDER BY 1 DESC
Note : Above SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY view refresh has latency of 45 minutes

Cognos Report Studio: Cascading Prompts populating really slow

I've a Cognos report in which I've cascading prompts. The Hierarchy is defined in the image attached.
The First Parent (Division) fills the two cascading child in 3-5 seconds.
But when I select any Policy, (that will populate the two child beneath) it took around 2 minutes.
Facts:
The result set after two minutes is normal (~20 rows)
The Queries behind all the prompts are simple Select DISTINCT Col_Name
Ive created indexes on all the prompt columns.
Tried turning on the local cache and Execution Method to concurrent.
I'm on Cognos Report Studio 10.1
Any help would be much appreciated.
Thanks,
Nuh
There is an alternative to a one-off dimension table. Create a Query Subject in Framework for your AL-No prompt. In the query itself, build a query that gets distinct AL-No (you said that is fast, probably because there is an index on AL-No). Wrap that in a select that does a filter on ' #prompt('pPolicy')#' (assuming your Policy Prompt is keyed to ?pPolicy?)
This will force the Policy into the sql before it is sent to the database, but wrapping on the distinct AL-No will allow you to use the AL-No index.
select AL_NO from
(
select AL_NO, Policy_NO
from CLAIMS
group by AL_NO, Policy_NO
)
where Policy_NO = #prompt('pPolicyNo')#
Your issue is just too much table scanning. Typically, one would build a prompt page from dimension-based tables, not the fact table, though I admit that is not always possible with cascading prompts. The ideal solution is to create a one-off dimension table with these distinct values, then model that strictly for the prompts.
Watch out for indexing each field, as the indexes will not be used due to the selectivity of the values. A compound index of the fields may work instead. As with any time you are making changes to the DDL - open SQL profiler and see what SQL Cognos is generating, then run an explain plan before/after the changes.

Resources