I am building an App Engine Flexible front end that uses BigQuery data. However, the queries run for about 30 seconds. Is there a way to cache these somewhere so that the data is returned faster?
One of your option is using configuration.query.useQueryCache
This property tells whether to look for the result in the query cache. The query cache is a best-effort cache that will be flushed whenever tables in the query are modified. Query cache is only available when a query does not have a destination table specified
See more in Using Cached Query Results
Related
Snowflake is caching already executed queries, such that subsequent executions of the same query will finish much faster. As far as I know, this only works if the query is exactely the same.
In my app (a interactive dashboard which uses Snowpark-Python 1.0 to fire snowflake queries), this caching does not seem to work. Each time the (same snowpark) query is fired, snowflake does run the query again:
Depending on whether the warehouse cache is active are not (blue vs green bar), the execution time is several 100ms up to 2s. The result is not read from the cache.
I think the cause is that the generated SQL does contain random components (column and table names suffixes) which are different for each query:
How can I make use of the snowflake cache using Snowpark-generated queries?
Is the cache valid once it has been retrieved, even after the warehouse has been suspended?
For example, the same query used in BI that is provided to multiple users is executed each time a user visits. The cache is available in this case, but is it still possible to use the cache when the warehouse is stopped? If not available, the perception is that there is a speed benefit in a system where the uptime of the warehouse is charged for, but on the other hand, there is no benefit on the cost side.
Is the cache valid once it has been retrieved, even after the warehouse has been suspended?
#Himanshu gave good explanation. Adding how it looks in query profiler for all three scenarios -
Metadata Cache - For any query such as -
select count(*) from SNOWFLAKE_SAMPLE_DATA.TPCDS_SF100TCL.CATALOG_SALES;
Are you taking of the Warehouse cache, it gets purged once the Warehouse is stopped and restarted.
Snowflake has the following different cache available,
a. Metadata cache – hold object information+ Statistics ( it is also called Metadata layer or Service layer or cloud service layer)
b. Result cache – last 24hrs of your result, the query result cache is retained for a maximum of 31 days after being generated. (it is also called Result set cache or 24 hrs. result cache or query result cache)
c. Warehouse cache – hold data locally as long as warehouse is running. (When Warehouse is suspended the cache is purged and cache is not purged when resumed)
(It is also called local cache or SSD cache or raw data cache or data cache)
(Users cannot see each other’s result but the result cache can re-use one users result cache and present it to other user)
I think as you are using the query you are talking of the Result cache, the cache is used by Snowflake if
The Query is same syntactically
If the user has permission to the tables
If the data in the tables used in the query has not changed
The query dose nit contains time function like CURRENTTIMESTAMP() and the query dose not use UDF.
The table micropartions have not been changed or re-clustered.
I'm working on some Azure databases where I'm not admin, and I have this issue where, while trying to optimize some queries, at some point my queries are being cached, and I get false "great results".
How can I avoid my queries to be cached?
I would normally run DBCC FREEPROCCACHE and DBCC DROPCLEANBUFFERS, but since I'm not an admin, I can't do that.
Thanks!
How can I avoid my queries to be cached?
You can always send trivially-different queries. Any change in the query text, including in a comment will prevent the reuse of a cached plan.
But cached query plans and cached data pages are the normal state of a database. Cold caches are an abnormal condition.
But stepping back, you can optimize queries in either state. You should be looking at the query plans and the cost of the queries in CPU and Logical IO, which don't depend on whether the query plan or data pages are already cached.
Background
We have an export functionality which allows the user to export measurements from many devices for a certain time range.
The database is continuously being updated with sensor data from many devices (at 10 to 50 measurements per second from each device). So since these CSV exports often contain millions of rows (each row containing data from several different devices, where each device is in a separate database), it was written to fetch data from these tables in chunks (I guess to avoid slowing down inserts, since exporting is a lower priority task).
Problem
When the first chunk is queried, SQL Server will create an execution plan which fits the passed parameters. However, at the beginning of the export, it's possible that ranges of data are missing due to low connectivity, or are flagged differently due to time sync errors, meaning the following queries reusing this cached plan will possibly not get the optimal plan for their parameters.
One option is to add OPTION (RECOMPILE) to each query, but many sources claim that this will impose unnecessary burden on the CPU and that merely updating statistics on a regular base should help SQL Server create a better plan.
However, this doesn't make too much sense, because even if the plan cache is invalidates, the next time a user creates the query, first chunk will again dictate the "shape" of the execution plan.
Is this reasoning correct?
Should I actually use OPTION (RECOMPILE) with each query, or just add an "Update Statistics" maintenance plan?
I am parameterizing my web app's ad hoc sql. As a result, I expect the query plan cache to reduce in size and have a higher hit ratio. Perhaps even other important metrics will be improved.
Could I use perfmon to track this? If so, what counters should I use? If not perfmon, how could I report on the impact of this change?
SQL Server, Plan Cache Object
Cache Hit Ratio Ratio between cache hits and lookups.
Cache Object Counts Number of cache objects in the cache.
Cache Pages Number of 8-kilobyte (KB) pages used by cache objects.
Cache Objects in use Number of cache objects in use.
Also sys.dm_os_memory_clerks and sys.dm_os_memory_cache_counters will give information about memory allocations (in general) and SQL caches (in general). You'll be interested in allocation for the plan cache memory clerk.
And finally there are the execution DMVs: sys.dm_exec_query_stats and sys.dm_exec_cached_plans.
These counters and DMVs should cover what you need, for more details see Execution Plan Caching and Reuse.
You can use SQL Server Profiler. Create a new trace, and capture the TSQL->Exec Prepared Sql and TSQL->Prepare Sql events. The former will tell you when it's reusing a query plan, the latter when it is regenerating the plan.
You can do the same for Stored Procedures as well, under the SP category of events.