How to find the count of total connections in snowflakes - snowflake-cloud-data-platform

We know that we have "show transactions" to see the transactions currently connected to database.
But I am interested
- To get the count of active users for each warehouse?
-History of connections count for each warehouse?
Is there a way to get above information using the sql commands (not the web ui)

If I understood correctly, you want to see the warehouse and active user mapping. There is no direct views as per my knowledge but you can leverage provided query where by keeping warehouse size !='0' you can tied warehouse and user together. You can check the below link
https://docs.snowflake.com/en/sql-reference/account-usage/query_history.html
Before that
Snowflake Sessions are not tagged with user name or account , those are system
generated ID.
User and warehouse relationship is zero or many (An active user can use multiple warehouse in parallel , also a warehouse can be used by multiple users at same point of time)
A user can have active session without a running warehouse
It is not mandatory to have an active user to keep your warehouse running
Finally, queries can also be executed without turning the warehouse up
SELECT TO_CHAR(DATE_TRUNC('minute', query_history.START_TIME ),'YYYY-MM-DD
HH24:MI') AS "query_history.start_time",
query_history.WAREHOUSE_NAME AS "query_history.warehouse_name",
query_history.USER_NAME AS "query_history.user_name"
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY AS query_history
WHERE (query_history.WAREHOUSE_SIZE != '0')
GROUP BY DATE_TRUNC('minute', query_history.START_TIME ),2,3
ORDER BY 1 DESC
Note : Above SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY view refresh has latency of 45 minutes

Related

What is the best way to query number of customers subscribed during a particular time period

We want to enable our users to get insights from the data. We have been using Tableau as a self-service BI platform. Our dashboard users have a specific request; they want to see data within a particular time period
Below is how my dataset looks like (dates are mm/dd/yy).
User request - To see how many customers were subscribed within a time period regardless of their current status. i.e. even if their current status is cancelled as long as they were active during the user-provided time period they should be counted
Example - User selects time range to be 01/01/2020 - 03/31/2020. Running a query on below data set should return count of 3. [CUST1 as they cancelled after 03/31/2020, CUST3 as they signed up before 01/01/2020 but are still active, CUST5 as they were active for some point during that period]
Problem - While I can write a SQL query with abundant where clauses to achieve this, we want a self-service automated way i.e. we want users to just provide us the time range and get the number. How do we achieve this in a BI tool like Tableau? I am also open to other tools, changing the data model design and other options. The goal is to just make this automated rather than having a person manually update and run a SQL query
Customer ID
Subscription Start Date
Subscription End Date
Subscription Status
CUST1
10/11/2019
04/12/2020
Cancelled
CUST2
01/12/2020
Active
CUST3
05/01/2019
Active
CUST4
06/07/2012
07/08/2012
Cancelled
CUST5
01/12/2020
03/14/2020
Cancelled
CUST6
04/12/2020
Active

Does Snowflake charge me for queries with WAREHOUSE_SIZE=NULL?

I am trying to reduce the costs of our queries from Snowflake. In the docs, there is a detailed explanation about the credits per second by the size of the used warehouse. When I look at the history tab in the web console, or at the 'snowflake.account_usage.query_history' view, I see that some of the queries have NULL value in the WAREHOUSE_SIZE column (commit statements, desc table etc.). Does anyone know how this type of queries is charged? maybe they are free?
This doesn't seem to be mentioned anywhere in the docs.
The result of any query is available for the following 24 hours through the Service Layer in the Result Cache. Therefore, queries that appear to have run without any Warehouse, do Not effectively use any Warehouse.
However, it doesn't necessarily mean that the Warehouse it was supposed to be used otherwise, was Not running at that time.
Say, for example, another query 'Q1' was executed within "your" warehouse 'MyWH' just before you ran your 'Q2': while yours will hit the cache without needing 'MyWH' running, 'Q1' will still cause 'MyWH' to resume and therefore consume credits.
More details on Snowflake Caching are available here: https://community.snowflake.com/s/article/Caching-in-Snowflake-Data-Warehouse
Queries without a warehouse listed are not charged via the compute credits. These types of queries are using the "cloud services" layer in Snowflake and are charged differently.
Query to determine how much and if cloud services are being used.
use schema snowflake.account_usage;
select query_type, sum(credits_used_cloud_services) cs_credits, count(1) num_queries
from query_history
where true
and start_time >= timestampadd(day, -1, current_timestamp)
group by 1
order by 2 desc
limit 10;

Snowflake Alert Long Running Queries

How to alert long running queries, to multiple users in snowflake ?
Right now the alert is sent only to the account admin role user.
Is there any way to notify the long query alert to "the user running the query OR notify to
multiple users belong to the particular warehouse/database" ?
Is there any way to leverage Snowflake Notification Integration for the above alerts?
Thanks In Advance
Sundar
It is possible to fulfill such requirement by using alerts and email notifications.
Setting Up Alerts Based on Data in Snowflake:
In some cases, you might want to be notified or take action when data in Snowflake meets certain conditions. For example, you might want to receive a notification when:
The warehouse credit usage increases by a specified percentage of your current quota.
The resource consumption for your pipelines, tasks, materialized views, etc. increases beyond a specified amount.
A data access request is received from an unauthorized user.
Your data fails to comply with a particular business rule that you have set up.
To do this, you can set up a Snowflake alert. A Snowflake alert is a schema-level object that specifies:
A condition that triggers the alert (e.g. the presence of queries that take longer than a second to complete).
The action to perform when the condition is met (e.g. send an email notification, capture some data in a table, etc.).
When and how often the condition should be evaluated (e.g. every 24 hours, every Sunday at midnight, etc.).
Sample:
CREATE OR REPLACE ALERT alert_long_queries
WAREHOUSE = my_warehouse_name
SCHEDULE = '5 MINUTE'
IF (EXISTS (
SELECT *
FROM TABLE(SNOWFLAKE.INFORMATION_SCHEMA.QUERY_HISTORY())
WHERE EXECUTION_STATUS ILIKE 'RUNNING'
AND start_time < current_timestamp() - INTERVAL '5 MINUTES'
))
THEN CALL SYSTEM$SEND_EMAIL(...);
The only notification available out-of-the-box in Snowflake is the Resource Monitor whereby AccountAdmin members only can subscribe for notifications.
https://docs.snowflake.com/en/user-guide/resource-monitors.html#resource-monitor-properties

Snowflake: How can I find out which internal table stages consumes most storage?

In my snowflake account I can see that there is a lot of storage used by stages. I can see this for example using the following query:
select *
from table(information_schema.stage_storage_usage_history(dateadd('days',-10,current_date()),current_date()));
There are no named stages in the databases. All used storage must be in internal stages.
How can I find out which internal stages consumes most storage?
If I know the name of the table I can list all files in the table stage using something like this:
list #SCHEMANAME.%TABLENAME;
My problem is that there are hundreds of tables in the databases and I have no idea which tables to query.
There is an ACCOUNT_USAGE view called STAGE_STORAGE_USAGE_HISTORY in the Snowflake database/share that will give you everything, including internal stages. I would use this over the information_schema view, since that is limited to what your role currently has access to.
https://docs.snowflake.com/en/sql-reference/account-usage/stage_storage_usage_history.html
You can use the view STAGES in Information schema or account usage to get the stages. Do note that Account usage has higher retention period than information schema and data retrieval is faster. You can read more here
If I understood correctly,you want to do something with stages to reduce the overall billing or storage size
Snowflake Stages
'Internal Named' and 'External' stages :
These are the only stages which can be altered or dropped and controlled by user
User Stage and Table stages are one which can not be altered or dropped,
managed by snowflake
So even if you could identify those Table and User specific stages , you can not drop those.
Snowflake Storage consumption includes below three components for billing
1. Databases size
2. Stages size
3. Fail Safe size
The size of the storage occupied can be visible (Only when you have Accountadmin role
or MONITOR privs) under below location webUI tab
Account Tab ---> Usage --> Average Storage Used
Note : on the account tab, No DB object wise storage billing details available
So how you can see the storage consumption of the associated tables (including their fail safe and time travel bit) and stage details
select * from <DB Name>."INFORMATION_SCHEMA"."TABLE_STORAGE_METRICS"
select * from <DB Name>."INFORMATION_SCHEMA"."STAGES"
Hope clarification helped
Thanks
Palash Chatterjee

How does warehouse size change automatically in Snowflake?

I have a small warehouse in SnowFlake, with minimum clusters = 1, maximum clusters = 5 and scaling policy set to standard. However, when I was viewing the query history profile, I saw that for some of the queries, size column was set to large, but the cluster number remained 1.
Now, I know that autoscaling helps increasing number of clusters, but how did the warehouse size change for some queries without manual intervention?
I referred to the official documentation of SnowFlake here, but couldn't find any ways to automatically change size of warehouse.
Snowflake does not carry any feature that will automatically alter the size of your warehouse.
It is likely that the tools in use (or users) may have run an ALTER WAREHOUSE SET WAREHOUSE_SIZE=LARGE. The purpose may have been to prepare for a larger operation, ensuring adequate performance temporarily.
Use the various history views to find out who/what and when such a change was run. For example, the QUERY_HISTORY view could be useful in finding the username and role that was used to alter the warehouse size, with the following query:
SELECT DISTINCT user_name, role_name, query_text, session_id, start_time
FROM snowflake.account_usage.query_history
WHERE query_text ILIKE 'ALTER%SET%WAREHOUSE_SIZE%=%LARGE%'
AND start_time > CURRENT_TIMESTAMP() - INTERVAL '7 days';
Then you could use LOGIN_HISTORY view to find which IP the user authenticated from during the time (or use the history UI for precise client information), check all other queries executed in the same session, etc.
To prevent unauthorized users from modifying warehouse sizes, consider restricting warehouse-level grants on their roles (rolename in use can be detected by the query above).

Resources