Snowflake Caching Validation - snowflake-cloud-data-platform

Snowflake Caching Validation - snowflake-cloud-data-platform

Is the cache valid once it has been retrieved, even after the warehouse has been suspended?
For example, the same query used in BI that is provided to multiple users is executed each time a user visits. The cache is available in this case, but is it still possible to use the cache when the warehouse is stopped? If not available, the perception is that there is a speed benefit in a system where the uptime of the warehouse is charged for, but on the other hand, there is no benefit on the cost side.
Is the cache valid once it has been retrieved, even after the warehouse has been suspended?

#Himanshu gave good explanation. Adding how it looks in query profiler for all three scenarios -
Metadata Cache - For any query such as -
select count(*) from SNOWFLAKE_SAMPLE_DATA.TPCDS_SF100TCL.CATALOG_SALES;

Are you taking of the Warehouse cache, it gets purged once the Warehouse is stopped and restarted.
Snowflake has the following different cache available,
a. Metadata cache – hold object information+ Statistics ( it is also called Metadata layer or Service layer or cloud service layer)
b. Result cache – last 24hrs of your result, the query result cache is retained for a maximum of 31 days after being generated. (it is also called Result set cache or 24 hrs. result cache or query result cache)
c. Warehouse cache – hold data locally as long as warehouse is running. (When Warehouse is suspended the cache is purged and cache is not purged when resumed)
(It is also called local cache or SSD cache or raw data cache or data cache)
(Users cannot see each other’s result but the result cache can re-use one users result cache and present it to other user)
I think as you are using the query you are talking of the Result cache, the cache is used by Snowflake if
The Query is same syntactically
If the user has permission to the tables
If the data in the tables used in the query has not changed
The query dose nit contains time function like CURRENTTIMESTAMP() and the query dose not use UDF.
The table micropartions have not been changed or re-clustered.

Related

Snowflake caching not working with Snowpark Python 1.0

Snowflake is caching already executed queries, such that subsequent executions of the same query will finish much faster. As far as I know, this only works if the query is exactely the same.
In my app (a interactive dashboard which uses Snowpark-Python 1.0 to fire snowflake queries), this caching does not seem to work. Each time the (same snowpark) query is fired, snowflake does run the query again:
Depending on whether the warehouse cache is active are not (blue vs green bar), the execution time is several 100ms up to 2s. The result is not read from the cache.
I think the cause is that the generated SQL does contain random components (column and table names suffixes) which are different for each query:
How can I make use of the snowflake cache using Snowpark-generated queries?

Cache BigQuery queries

I am building an App Engine Flexible front end that uses BigQuery data. However, the queries run for about 30 seconds. Is there a way to cache these somewhere so that the data is returned faster?

One of your option is using configuration.query.useQueryCache
This property tells whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. Query cache is only available when a query does not have a destination table specified
See more in Using Cached Query Results

Does index rebuilding + updating statistics actually fix the issue of skewed cached plans?

Background
We have an export functionality which allows the user to export measurements from many devices for a certain time range.
The database is continuously being updated with sensor data from many devices (at 10 to 50 measurements per second from each device). So since these CSV exports often contain millions of rows (each row containing data from several different devices, where each device is in a separate database), it was written to fetch data from these tables in chunks (I guess to avoid slowing down inserts, since exporting is a lower priority task).
Problem
When the first chunk is queried, SQL Server will create an execution plan which fits the passed parameters. However, at the beginning of the export, it's possible that ranges of data are missing due to low connectivity, or are flagged differently due to time sync errors, meaning the following queries reusing this cached plan will possibly not get the optimal plan for their parameters.
One option is to add OPTION (RECOMPILE) to each query, but many sources claim that this will impose unnecessary burden on the CPU and that merely updating statistics on a regular base should help SQL Server create a better plan.
However, this doesn't make too much sense, because even if the plan cache is invalidates, the next time a user creates the query, first chunk will again dictate the "shape" of the execution plan.
Is this reasoning correct?
Should I actually use OPTION (RECOMPILE) with each query, or just add an "Update Statistics" maintenance plan?

SQL DMV Queries & Cached Plans

My understanding is that some of the DMV's in SQL Server depend on query plans being cached. My questions are these. Are all query plans cached? If not, when is a query plan not cached? For ones that are cached, how long do they stay in the cache?
Thanks very much

Some of the SQL Server DMV's that capture tokens relating directly to the query plan cache, are at the mercy of the memory pressure placed on the query plan cache (due to adhoc queries, other memory usage and high activity, or through recompilation). The query plan cache is subject to plan aging (e.g. a plan with a cost of 10 that has been referenced 5 times has an "age" value of 50):
If the following criteria are met, the plan is removed from memory:
· More memory is required by the system
· The "age" of the plan has reached zero
· The plan isn't currently being referenced by an existing connection
Ref.
Those DMV's not directly relating to the query plan cache are flushed under 'general' memory pressure (cached data pages) or if the sql server service is restarted.
The factors affecting query plan caching have changed slightly since SQL Server 2000. The up-to-date reference for SQL Server 2008 is here: Plan Caching in SQL Server 2008

I just want to add some geek minutia: The Query plan cache leverages the general caching mechanism of SQL Server. These caches use the Clock algorithm for eviction, see Q and A: Clock Hands - what are they for. For query plan caches, the cost of the entry takes into consideration the time, IO and memory needed to create the cache entry.
For ones that are cached, how long do
they stay in the cache?
A valid object stays in cache until the clock hand decrements the cost to 0. See sys.dm_os_memory_cache_clock_hands. There is no absolute time answer to this question, the clock hand could decrement an entry to 0 in a second, in a hour, in a week or in a year. It all depends on the initial cost of the entry (query/schema complexity), on the frequency of reusing the plan, and the clock hands speed (memory pressure).
Cached object may be invalidated though. The various reasons why a Query plan gets invalidated are explained in great detail the white paper linked by Mitch: Plan Caching in SQL Server 2008.

How can an improvement to the query cache be tracked?

I am parameterizing my web app's ad hoc sql. As a result, I expect the query plan cache to reduce in size and have a higher hit ratio. Perhaps even other important metrics will be improved.
Could I use perfmon to track this? If so, what counters should I use? If not perfmon, how could I report on the impact of this change?

SQL Server, Plan Cache Object
Cache Hit Ratio Ratio between cache hits and lookups.
Cache Object Counts Number of cache objects in the cache.
Cache Pages Number of 8-kilobyte (KB) pages used by cache objects.
Cache Objects in use Number of cache objects in use.
Also sys.dm_os_memory_clerks and sys.dm_os_memory_cache_counters will give information about memory allocations (in general) and SQL caches (in general). You'll be interested in allocation for the plan cache memory clerk.
And finally there are the execution DMVs: sys.dm_exec_query_stats and sys.dm_exec_cached_plans.
These counters and DMVs should cover what you need, for more details see Execution Plan Caching and Reuse.

You can use SQL Server Profiler. Create a new trace, and capture the TSQL->Exec Prepared Sql and TSQL->Prepare Sql events. The former will tell you when it's reusing a query plan, the latter when it is regenerating the plan.
You can do the same for Stored Procedures as well, under the SP category of events.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Snowflake Caching Validation - snowflake-cloud-data-platform

#Himanshu gave good explanation. Adding how it looks in query profiler for all three scenarios - Metadata Cache - For any query such as - select count(*) from SNOWFLAKE_SAMPLE_DATA.TPCDS_SF100TCL.CATALOG_SALES;

Related

Snowflake caching not working with Snowpark Python 1.0

Cache BigQuery queries

Does index rebuilding + updating statistics actually fix the issue of skewed cached plans?

SQL DMV Queries & Cached Plans

How can an improvement to the query cache be tracked?

Categories

Resources