I am working on creating user level monitor of credit usage on a monthly level, And have created a small query as per below.
Select user_name,
sum(CREDITS_USED_CLOUD_SERVICES) credit_used,
date_trunc(month, current_date) month,
Warehouse_name
from query_history
where start_time >= date_trunc(month, current_date)
group by 1,3,4
order by 2 desc;
And it seems to be working however there is only one confusion. the Credit_usage field is named as
CREDITS_USED_CLOUD_SERVICES in snowflake, which makes me think it is only providing me the credit usage of service layer and not warehouse layer. If so then this query is not good? and If my concern is right, can somebody please suggest or guide me to correct path of how to get the credit usage per user.
Related
Do we have any system tables in Snowflake which gives us the credit usage information like :
a. Warehouse level
b. Account level, etc..
Requirement --> We have a requirement where there is a need to extract those information from SF via available SF connectors & orchestrate them as per the need of the client.
Regards,
Somen Swain
I think you need the WAREHOUSE_METERING_HISTORY Account Usage view
Credits used by each warehouse in your account (month-to-date):
select warehouse_name,
sum(credits_used) as total_credits_used
from warehouse_metering_history
where start_time >= date_trunc(month, current_date)
group by 1
order by 2 desc;
More Details: https://docs.snowflake.com/en/sql-reference/account-usage.html#examples-warehouse-credit-usage
I am using Flink-SQL 1.13. The target is to calculate number of new users in real-time.
Due to some constraints, I cannot directly use register events because the accounts created there like a platform pass. One account can login into multiple games and for each game, the user is new when it firstly enter that game. So I can only calculate this by checking whether this account has logined to this game before from the login log. The format of login log is like:
user_id game_id login_time
111 game1 2021-05-13 01:01:01
111 game3 2021-05-23 02:02:02
The question is the amount of login log increases significantly every day. Although I can save the log into HBase, one day it will still be too large...
Is there any other way to do this? Maybe I can put historical users into redis hyperloglog, but it seems Flink-SQL does not have a redis connector yet...Thanks for your help in advance...
INSERT INTO first_login_stream (user_id, first_login_time)
SELECT
user_id,
FIRST_VALUE(login_time) first_login_time
FROM login_log
GROUP BY user_id
Which goes back into your event system / kafka. Which you can read back in windows for some hourly stats (which you can save in HBase):
INSERT INTO hbase_stats
SELECT
window_start,
window_end,
count(user_id) user_count
FROM TABLE(
TUMBLE(
TABLE first_login_stream,
DESCRIPTOR(<kafka_ingestion_time>),
INTERVAL '1' HOUR
)
)
GROUP BY
window_start,
window_end
It has to be checkpointed/saved (otherwise you'll incur the full log processing on restart). The state size will only grow by the number or users and not logins (I think. You should validate that.).
I am trying to reduce the costs of our queries from Snowflake. In the docs, there is a detailed explanation about the credits per second by the size of the used warehouse. When I look at the history tab in the web console, or at the 'snowflake.account_usage.query_history' view, I see that some of the queries have NULL value in the WAREHOUSE_SIZE column (commit statements, desc table etc.). Does anyone know how this type of queries is charged? maybe they are free?
This doesn't seem to be mentioned anywhere in the docs.
The result of any query is available for the following 24 hours through the Service Layer in the Result Cache. Therefore, queries that appear to have run without any Warehouse, do Not effectively use any Warehouse.
However, it doesn't necessarily mean that the Warehouse it was supposed to be used otherwise, was Not running at that time.
Say, for example, another query 'Q1' was executed within "your" warehouse 'MyWH' just before you ran your 'Q2': while yours will hit the cache without needing 'MyWH' running, 'Q1' will still cause 'MyWH' to resume and therefore consume credits.
More details on Snowflake Caching are available here: https://community.snowflake.com/s/article/Caching-in-Snowflake-Data-Warehouse
Queries without a warehouse listed are not charged via the compute credits. These types of queries are using the "cloud services" layer in Snowflake and are charged differently.
Query to determine how much and if cloud services are being used.
use schema snowflake.account_usage;
select query_type, sum(credits_used_cloud_services) cs_credits, count(1) num_queries
from query_history
where true
and start_time >= timestampadd(day, -1, current_timestamp)
group by 1
order by 2 desc
limit 10;
I want to get the last accessed timestamp for a table in the snowflake
Not always ideal, but a quick way to find this for one-off questions is to use QUERY_HISTORY
SELECT START_TIME, *
FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY())
WHERE QUERY_TEXT LIKE '%MYSCHEMA.MYTABLE%';
Update: Query to specifically get just the most recent query time. Have to filter out the QUERY_HISTORY queries themselves. This is not especailly fast, and does require that the role that's running this has access to all the relevant history.
SELECT MAX(START_TIME)
FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY())
WHERE QUERY_TEXT ILIKE '%CONFIG.PIPELINE_LOG%'
AND NOT QUERY_TEXT ILIKE '%INFORMATION_SCHEMA.QUERY_HISTORY%';
This is an older question, but I am answering it since Snowflake has added a new feature to track last access spanning 1 year. Since this information wasn't tracked from the creation of older objects, you will only see access history since the tracking started.
There is now a view in "SNOWFLAKE"."ACCOUNT_USAGE"."ACCESS_HISTORY". You can see how to query it by flattening the base_objects_accessed array:
select * from "SNOWFLAKE"."ACCOUNT_USAGE"."ACCESS_HISTORY",
LATERAL FLATTEN(base_objects_accessed) limit 100;
Hope this approach will help
Important Note:
Not a very suitable approach as a user must have AccountAdmin access, to run a query on snowflake.account_usage schema and query will have a bit of latency as it is part of account_usage schema. It will also incur WH cost if the data size is too big
select * from "SNOWFLAKE"."ACCOUNT_USAGE"."QUERY_HISTORY"
where
query_text like '%STORE_SALES%' and
query_type = 'SELECT'
order by START_TIME DESC
LIMIT 1
Alternatively, if the requirement is limited to the last 14 days history, use the history tab which costs nothing using simple filter clauses.
I want to get the last accessed timestamp for a table in the snowflake
Not always ideal, but a quick way to find this for one-off questions is to use QUERY_HISTORY
SELECT START_TIME, *
FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY())
WHERE QUERY_TEXT LIKE '%MYSCHEMA.MYTABLE%';
Update: Query to specifically get just the most recent query time. Have to filter out the QUERY_HISTORY queries themselves. This is not especailly fast, and does require that the role that's running this has access to all the relevant history.
SELECT MAX(START_TIME)
FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY())
WHERE QUERY_TEXT ILIKE '%CONFIG.PIPELINE_LOG%'
AND NOT QUERY_TEXT ILIKE '%INFORMATION_SCHEMA.QUERY_HISTORY%';
This is an older question, but I am answering it since Snowflake has added a new feature to track last access spanning 1 year. Since this information wasn't tracked from the creation of older objects, you will only see access history since the tracking started.
There is now a view in "SNOWFLAKE"."ACCOUNT_USAGE"."ACCESS_HISTORY". You can see how to query it by flattening the base_objects_accessed array:
select * from "SNOWFLAKE"."ACCOUNT_USAGE"."ACCESS_HISTORY",
LATERAL FLATTEN(base_objects_accessed) limit 100;
Hope this approach will help
Important Note:
Not a very suitable approach as a user must have AccountAdmin access, to run a query on snowflake.account_usage schema and query will have a bit of latency as it is part of account_usage schema. It will also incur WH cost if the data size is too big
select * from "SNOWFLAKE"."ACCOUNT_USAGE"."QUERY_HISTORY"
where
query_text like '%STORE_SALES%' and
query_type = 'SELECT'
order by START_TIME DESC
LIMIT 1
Alternatively, if the requirement is limited to the last 14 days history, use the history tab which costs nothing using simple filter clauses.