Snowflake Credit Usage too high compared to query runtime - snowflake-cloud-data-platform

I am looking at the queries performed against my warehouse and finding the credit calculation I'm using doesn't add up to what's being shown in snowflake. As I understand it, it is supposed to use credits per second of query time with a minimum of 60s. So if a query runs for 5s it would use 60s worth of credits, but if a query runs for 61s it will use 61s worth of credits.
Looking at the query history, limiting only to queries performed on my warehouse, I am only seeing 5 queries for the hour in question (12).
These queries copy their results into an S3 bucket in my AWS account.
If I take the starts and ends of each of these queries and chart time, I am only seeing a total of 455 seconds of query time. With the X-Small warehouse that I'm using (1 credit per hour), that should be only 0.126 credits used for that hour.
But I am seeing 0.66 credits used here:
What am I missing about snowflake credit usage? Why does it appear that I am using more credits than I should?

Moving answer from comments to an actual answer (for completeness):
Snowflake costs don't reflect query runtimes, but warehouses being run.
AUTO_SUSPEND can be set to 60 seconds (or less) to more closely match the duration of queries.
You can refer to the official Snowflake documentation for more details:
Virtual Warehouse Credit Usage
How are Credits Charged for Warehouses?

Related

Latency in Snowflake Account Usage Views

I am trying to understand "latency" issue with Account Usage Views.
Does the latency, let's say for Query History mentioned to be 45 min, mean it might take 45 min for a query to pull result out of Account Usage view or does it mean it might take time for data to be available in Account Usage view?
When I query Account Usage in a trial account, query doesnt take much time on Account Usage view and also Account Usage view shows latest sql details in Query History so I am not able to understand what latency denote.
Another question is if latency means the amount of time SQL will take to pull result I assume it will keep the Warehouse in running state increasing the cost
Data latency
Due to the process of extracting the data from Snowflake’s internal metadata store, the account usage views have some natural latency:
For most of the views, the latency is 2 hours (120 minutes).
For the remaining views, the latency varies between 45 minutes and 3 hours.
For details, see the list of views for each schema (in this topic). Also, note that these are all maximum time lengths; the actual latency for a given view when the view is queried may be less.
"Does the latency, let's say for Query History mentioned to be 45 min, mean it might take 45 min for a query to pull result out of Account Usage view or does it mean it might take time for data to be available in Account Usage view?"
The terms latency refers to the time until the data will be available in Account Usage view.
It does not mean that query SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.<some_view> takes 45 minutes to execute.

Regarding the data amount limit of Snowflake

・ Maximum number of records per table
・ Maximum capacity limit per table
・ Limitation on the number of tables that can be created with Snowflake
Are there any restrictions such as?
There are no limits like that as of now.
For more read here.
I've used Snowflake extensively. In a previous position, we had more than 50 Petabytes of data in Snowflake, spread out over more than 10,000 tables (I don't have the exact number, I just stopped counting at 10,000).
In my current position, we have a single table with more than 100 TB of data - that is the compressed size on Snowflake. We can run text search queries on this table in a matter of seconds.
Snowflake scales really well and has no "limits" on things like this, per se. Just be aware that with the pay-for-what-you-use pricing model, your budget can be the limiting factor.
I've heard a horror story about a company that shifted their entire process into Snowflake with no apparent problems, but their first month's bill exceeded their entire project's budget. Find ways to learn from mistakes while they're small, and impose your own limits while you figure out how to optimize things for cost.

Query compilation and provisioning times

What does it mean there is a longer time for COMPILATION_TIME, QUEUED_PROVISIONING_TIME or both more than usual?
I have a query runs every couple of minutes and it usually takes less than 200 milliseconds for compilation and 0 for provisioning. There are 2 instances in the last couple of days the values are more than 4000 for compilation and more than 100000 for provisioning.
Is that mean warehouse was being resumed and there was a hiccup?
COMPILATION_TIME:
The SQL is parsed and simplified, and the tables meta data is loaded. Thus a compile for select a,b,c from table_name will be fractally faster than select * from table_name because the meta data is not needed from every partition to know the final shape.
Super fragmented tables, can give poor compile performance as there is more meta data to load. Fragmentation comes from many small writes/deletes/updates.
Doing very large INSERT statements can give horrible compile performance. We did a lift-and-shift and did all data loading via INSERT, just avoid..
PRIOVISIONING_TIME is the amount of time to setup the hardware, this occurs for two main reasons ,you are turning on 3X, 4X, 5X, 6X servers and it can take minutes just to allocate those volume of servers.
Or there is failure, sometime around releases there can be a little instability, where a query fails on the "new" release, and query is rolled back to older instances, which you would see in the profile as 1, 1001. But sometimes there has been problems in the provisioning infrastructure (I not seen it for a few years, but am not monitoring for it presently).
But I would think you will mostly see this on a on going basis for the first reason.
The compilation process involves query parsing, semantic checks, query rewrite components, reading object metadata, table pruning, evaluating certain heuristics such as filter push-downs, plan generations based upon the cost-based optimization, etc., which totally accounts for the COMPILATION_TIME.
QUEUED_PROVISIONING_TIME refers to Time (in milliseconds) spent in the warehouse queue, waiting for the warehouse compute resources to provision, due to warehouse creation, resume, or resize.
https://docs.snowflake.com/en/sql-reference/functions/query_history.html
To understand the reason behind the query taking long time recently in detail, the query ID needs to be analysed. You can raise a support case to Snowflake support with the problematic query ID to have the details checked.

Snowflake Query Profile Interface

Have been played around with the Snowflake Query Profile Interface but missing information about the parallelism in query execution. Using a Large or XLarge Warehouse it is still only using two servers to execute the query. Having an XLarge Warehouse a big sort could be divided in 16 parallel execution threads to fully exploit my Warehouse and credits. Or?
Example: Having a Medium Warehouse as:
Medium Warehouse => 4 servers
Executing the following query:
select
sum(o_totalprice) "order total",
count(*) "number of orders",
c.c_name "customer"
from
orders o inner join customer c on c.c_custkey = o.o_custkey
where
c.c_nationkey in (2,7,22)
group by
c.c_name
Gives the following Query Plan:
Query Plan
In the execution details I cannot see anything about the participating servers:
enter image description here
Best Regards
Jan Isaksson
In an ideal situation snowflake will try to split your query and let every core of the warehouse to process a piece of the query. For example, if you have a 2XL warehouse, you have 32x8 = 256 cores(each node in a warehouse has 8 cores). So, if a query is submitted, in an ideal situation snowflake will try to divide it into 256 parts and have each core process a part.
In reality, it may not be possible to parallize to that extent and that is because either the query itself cannot be broken down like that(for example, if you are trying to calculate let's say a median) or if the data itself is preventing it to parallelize(for example if you are trying to run a window function on a column which is skewed) it to that extent.
Hence, it is not always true that if you move to a bigger warehouse your query performance will improve linearly.
I tested your query starting with smallest compute size and the up. The linear scaling (more compute resource results in improved performance) stops around medium size, at which point there is no added benefit of performance improvement. This indicates your query is not big enough to take advantage of more compute resource and size s is good enough, especially considering cost optimization.

Do accessing the Results cache in Snowflake consumes Compute Credits?

If i have ran a large query in snowflake and executed the same query after 5 minutes with out any change to the table etc. It is my understanding that the results will be fetched from Results Cache. In this case will it consume Compute Credits?
Not today, BUT, if you use an unusually high amount of result cache compared to your compute credits on your account, you will begin to be billed for your services layer consumption. There was an announcement on this in November that is important to understand. For those using the system in an expected fashion won't be affected by this, but it's important to review:
https://www.snowflake.com/blog/whats-new-with-the-snowflake-cloud-services-billing-model/
A few comments and updates about the product: (1) . Mike Walton's response below about the upcoming service layer billing is indeed important to be aware for operations like result caching that were previously credit-free (compute credit-free). (2) To understand what conditions required in order for Snowflake to reuse the result cache, this documentation link gives comprehensive list: https://docs.snowflake.net/manuals/user-guide/querying-persisted-results.html#retrieval-optimization (3) The mentioned doc link also included the detail on how long the result cache will be kept: "Each time the persisted result for a query is reused, Snowflake resets the 24-hour retention period for the result, up to a maximum of 31 days from the date and time that the query was first executed. After 31 days, the result is purged and the next time the query is submitted, a new result is generated and persisted."
The Snowflake Support answered your question here: https://community.snowflake.com/s/question/0D50Z000082DhlPSAS/does-a-cached-result-on-a-suspended-warehouse-cost-compute-credits
Compute credits don't get consumed when you use the results cache so long as the query is exactly the same and the underlying table data hasn't changed. The results cache is purged after 24hrs too.

Resources