How to clear last run query (Cache) in snowflake - snowflake-cloud-data-platform

I want to test query performance . Example:
select * from vw_testrole.
vw_testrole- has lot of joins. Since the data is cached, it is returning in less time. I want to see the query plan and How to see it or clear cache or that I can see original time taken to execute.
Thanks,
Xi

Some extra info, as you are planning to do some "performance tests" to determine the expected execution time for a query.
The USE_CACHED_RESULT parameter disables to use of cached query results. It doesn't delete the existing caches. If you disable it, you can see the query plan (as you wanted), and your query will be executed each time without checking if the result is already available (because of previous runs of the same query). But you should know that Snowflake has multiple caches.
The Warehouse cache: As Simeon mentioned in the comment, Snowflake caches recently accessed the remote data (on the shared storage) in the local disks of the warehouse nodes. That's not easy to clean. Even suspending a warehouse may not delete it.
The Metadata cache - If your query access very big tables and compile time is long because of accessing metadata (for calculating stats etc), then this cache could be very important. When you re-run the query, it will probably read from the metadata cache, and significantly reduce compile time.
The result cache: This is the one you are disabling.
And, running the following commands will not disable it:
ALTER SESSION UNSET USE_CACHED_RESULT=FALSE;
ALTER SESSION UNSET USE_CACHED_RESULT;
The first one will give an error you experienced. The last one will not give an error but the default value is TRUE, so actually, it enables it. The correct command is:
ALTER SESSION SET USE_CACHED_RESULT=FALSE;

You can clear cache by setting ALTER SESSION UNSET USE_CACHED_RESULT;
To get plan of last query Id , you can use below stmt:
select system$explain_plan_json(last_query_id()) as explain_plan;

Related

Is there a way to force a JDBC prepared statement to re-prepare in Postgresql?

I have reviewed the similar question See and clear Postgres caches/buffers? , but all of the answers focus on the data buffers, and Postgresql has changed a lot since 2010.
Unlike the OP of that question, I am not looking for consistent behavior when computing performance, I am looking for adaptive behavior as the database changes over time.
In my applicaiton, at the beginning of a job execution, rows in the working tables are empty. Queries run very quickly, but as time goes on performance degrades because the prepared statements are not using ideal access paths (they were prepared when the tables were empty - doh!). Since a typical execution of the job will ultimately cover a few hundred million rows, I need to minimize all of the overheads and periodically run statistics to get the best access paths.
In SQLServer, one can periodically call update statistics and DBCC FreeProccache, and the prepared statements will automatically be re-prepared to use the new access paths.
Edit: FreeProcCache: in SQLServer, prepared statements are implemented as stored procedures. FreeProcCache wipes the compiled stored procedures so that they will be recompiled on the next invocation, and the new access paths come into effect immediately.
Edit: Details of postgresql management of prepared statements: Postgresql defers the prepare until the first call to EXECUTE, and caches the result ofthe prepare after the 5th execution. Once cached, the plan is fixed until the session ends or the prepared statement is freed with DEALLOCATE. Closing JDBC objects does not invoke DEALLOCATE, as an optimization to support open/read/close programming like many web apps display.
Is there a way to force a (Edit)JDBC prepared statement to recompile, (Edit) after running ANALYZE, so it will use the latest statistics?
EDIT: I am using JDBC PreparedStatement to prepare and execute queries against the database and the Postgres JDBC driver.
The way Postgresql updates statistics is via ANALYZE. This is also autoexecuted after a VACUUM run (since VACUUM frees references, and truncates empty pages, I would imagine much like your FreeProccache).
If autovacuum is enabled (the default), ANALYZE will be autorun according to the autovacuum cadence.
You do not need to "recompile" the prepared statement to pick up the new statistics in most cases because it will re-plan during each EXECUTE, and a parameterized prepared statement will replan based on the parameter values and the updated statistics. EDIT: The edge case described is where the query planner has decided to force a "generic plan" because the estimated cost of the specific plan exceeds the cost of such "generic plan" after 5 planned-executions.
Edit:
If you do reach this edge case, you can "drop" the prepared statement via DEALLOCATE (and then a re-PREPARE).
You may want to try ANALYZE before EXECUTE, but this will not guarantee a better performance...
Please ensure you really want to re-prepare statements. It might be the case you just want to close DB connection from time to time so statements get prepared "from scratch"
In case you really understand what you are doing (there might be valid reasons like you describe), you can issue DEALLOCATE ALL statement (it is a PostgreSQL-specific statement to deallocate all prepared statements). Recent pgjdbc versions (since 9.4.1210, 2016-09-07) handle that just fine and re-prepare the statements on subsequent use

How to Clear Cache in Sybase ASE

I want to clear the cache of Sybase ASE, so that I can test always the worst case scenario in two different queries.
What I found in my research was to use the commands below to clear cache, and sp_helpcache to check objects cached:
sp_unbindcache <dbname>, <table>
sp_unbindcache_all <cache name>
How did I tested it?
I ran a SELECT Count on a table before and after running sp_unbindcache and the second test was to run the query before and after sp_unbindcache_all
What happened?
The first time I ran the query there was physical I/O the subsequent tries did not, only Logical I/O. (Cache preserved despite running the unbindcache commands)
Weird Stuff
When I ran sp_helpcache it didn't show my table on the list of objects in Cache Binding Information (CBI). After running sp_unbindcache_all, sp_helpcache showed no rows on CBI. I then re-run the query and sp_helpcache was still with CBI empty.
This is weird because it might mean that when I run a query, my table is cache somewhere else.
The Question
So I would Like to know how can I find where my table is being cached when I run a query, and then how can I clear it from there?
Other Info
Database: SYBASE ASE 15.7
sp_helpcache only shows "default data cache"
Cache Binding Information(CBI) - is part of sp_helpcache's output
UPDATE:
I Have made a new test where I Bind the table to the "default data cache" to see if it would appear in CBI and it appeared.
Sp-helpcache only shows the bindings, not what’s in the cache. For that , you can use some of the MDA tables.
To clear the cache, binding and unbinding a table (or database) will do the job. Of course, rebooting ASE also will.
To clear cache in "default data cache" you should use dbcc
cachedataremove
for user defined cache you should use sp_unbindcache ,
or sp_unbindcache_all

Does SQL Server always physically build the full result set?

Imagine I have a big table with 20 columns and billion lines of data. Then I run a simple query like:
select [First Name], [Last Name]
from Audience;
After that I read the result set sequentially. Will SQL Server physically create all records (i.e. billion records) on the server side in the result set before I will start reading it? Is there any query plan that will build the result set dynamically while feeding it to the client?
I understand that concurrency reasons may prevent this. Can I give any hint that multiuser access is not possible? Maybe I should use cursors?
Depends on the query plan. If the query does not require any temporary internal structures then yes you get immediate response even before the full recordset has been constructed. If the query does require temporary internal storage (e.g. you are sorting it in a manner that doesn't match any index, or an index is available but a different one is used because it requires less I/O) then you will have to wait until the full recordset is constructed.
The only way to tell is to look at the query plan and examine each and every step. You will need to know how to interpret them... for example, a DISTINCT will require a temporary structure whereas a FLOW DISTINCT will not. If the query plan shows an EAGER SPOOL you will definitely have to wait, although there are a few things you can do to avoid them.
Note: You can't rely on this-- query plans can change depending not just on schema or indexes but on database statistics (e.g. selectivity), which are always changing.

Clustered DB - Stado - Slow first query

Using PostgreSQL in a clustered database (stado) on two nodes. I managed to configure stado coordinator and nodes agents successfully but when I try running a heavy query, the first time it takes too long to show results then after that it was fast.
When I restart the server it goes slow again. It's like stado does some caching or something. I thought the problem was because of stado initialization and thus configured agents but still the problem exists! Any ideas?
EDIT
Query:
SELECT id,position,timestamp
FROM table t1
WHERE id <> 0
AND ST_Intersects(ST_Buffer_Meters(ST_SetSRID(
ST_MakePoint(61.4019, 15.218205), 4326), 1160006), position)
AND timestamp BETWEEN '2013-10-01' AND '2014-01-01';
Explain:
ٍٍStep 0
_______
Target: CREATE UNLOGGED TABLE "TMPTT7_1" ( "XCOL1" INT) WITHOUT OIDS
SELECT: SELECT count(*) AS "XCOL1" FROM "t1" WHERE "t1"."timestamp" BETWEEN '2013-10-01' AND '2014-01-01' AND ("t1"."id"<>0) AND ST_Intersects(ST_Buffer_Meters(ST_SetSRID(
ST_MakePoint(61.4019, 15.218205), 4326), 1160006), "t1"."position")
Step: 1
_______
Select: SELECT SUM("XCOL1") AS "EXPRESSION6" FROM "TMPTT7_1"
Drop:
TMPTT7_1
Two reasons.
Caching, obviously. When a query is executed the first time with cold cache, obviously the cache is populated. That goes for system cache as well as database cache, both work together, at least in standard Postgres. Can make a huge difference.
Query plan caching, possibly. To a much lesser degree. If you run the same query in a single session repeatedly, plans for PL/pgSQL functions for instance are cached.
Depending on your type of connection to the database, there may also be network latency, which may be higher for the first call.
Caching in memory is the reason, that is correct. A good tip for this type of situation is to "warm-up" the database each time you restart it with a script that runs the query (or a similar query that still accesses the same data). In some cases I have seen instances where several "warm-up" queries are run after any type of restart, then users still have a good experience. You will still have to wait for the warm-up query to finish after a restart, but at least it will not be a user waiting for that.
The other possibility is that you are doing a non-indexed query, you should check for that. If it is indexed and accessing a reasonable amount of data by a key, then it should be fast (even without the warm-up for most queries). This is a very common problem, easy to miss. Use the Postres EXPLAIN command, it will show you how the query is being performed against the database (i.e., with an index or without).

SQL Server lock/hang issue

I'm using SQL Server 2008 on Windows Server 2008 R2, all sp'd up.
I'm getting occasional issues with SQL Server hanging with the CPU usage on 100% on our live server. It seems all the wait time on SQL Sever when this happens is given to SOS_SCHEDULER_YIELD.
Here is the Stored Proc that causes the hang. I've added the "WITH (NOLOCK)" in an attempt to fix what seems to be a locking issue.
ALTER PROCEDURE [dbo].[MostPopularRead]
AS
BEGIN
SET NOCOUNT ON;
SELECT
c.ForeignId , ct.ContentSource as ContentSource
, sum(ch.HitCount * hw.Weight) as Popularity
, (sum(ch.HitCount * hw.Weight) * 100) / #Total as Percent
, #Total as TotalHits
from
ContentHit ch WITH (NOLOCK)
join [Content] c WITH (NOLOCK) on ch.ContentId = c.ContentId
join HitWeight hw WITH (NOLOCK) on ch.HitWeightId = hw.HitWeightId
join ContentType ct WITH (NOLOCK) on c.ContentTypeId = ct.ContentTypeId
where
ch.CreatedDate between #Then and #Now
group by
c.ForeignId , ct.ContentSource
order by
sum(ch.HitCount * hw.HitWeightMultiplier) desc
END
The stored proc reads from the table "ContentHit", which is a table that tracks when content on the site is clicked (it gets hit quite frequently - anything from 4 to 20 hits a minute). So its pretty clear that this table is the source of the problem. There is a stored proc that is called to add hit tracks to the ContentHit table, its pretty trivial, it just builds up a string from the params passed in, which involves a few selects from some lookup tables, followed by the main insert:
BEGIN TRAN
insert into [ContentHit]
(ContentId, HitCount, HitWeightId, ContentHitComment)
values
(#ContentId, isnull(#HitCount,1), isnull(#HitWeightId,1), #ContentHitComment)
COMMIT TRAN
The ContentHit table has a clustered index on its ID column, and I've added another index on CreatedDate since that is used in the select.
When I profile the issue, I see the Stored proc executes for exactly 30 seconds, then the SQL timeout exception occurs. If it makes a difference the web application using it is ASP.NET, and I'm using Subsonic (3) to execute these stored procs.
Can someone please advise how best I can solve this problem? I don't care about reading dirty data...
EDIT:
The MostPopularRead stored proc is called very infrequently - its called on the home page of the site, but the results are cached for a day. The pattern of events that I am seeing is when I clear the cache, multiple requests come in for the home site, and they all hit the stored proc because it hasn't yet been cached. SQL Server then maxes out, and can only be resolved by restarting the sql server process. When I do this, usually the proc will execute OK (in about 200 ms) and put the data back in the cache.
EDIT 2:
I've checked the execution plan, and the query looks quite sound. As I said earlier when it does run it only takes around 200ms to execute. I've added MAXDOP 1 to the select statement to force it to use only one CPU core, but I still see the issue. When I look at the wait times I see that XE_DISPATCHER_WAIT, ONDEMAND_TASK_QUEUE, BROKER_TRANSMITTER, KSOURCE_WAKEUP and BROKER_EVENTHANDLER are taking up a massive amount of wait time.
EDIT 3:
I previously thought that this was related to Subsonic, our ORM, but having switched to ADO.NET, the erros is still live.
The issue is likely concurrency, not locking. SOS_SCHEDULER_YIELD occurs when a task voluntarily yields the scheduler for other tasks to execute. During this wait the task is waiting for its quantum to be renewed.
How often is [MostPopularRead] SP called and how long does it take to execute?
The aggregation in your query might be rather CPU-intensive, especially if there are lots of data and/or ineffective indexes. So, you might end up with high CPU pressure - basically, a demand for CPU time is too high.
I'd consider the following:
Check what other queries are executing while CPU is 100% busy? Look at sys.dm_os_waiting_tasks, sys.dm_os_tasks, sys.dm_exec_requests.
Look at the query plan of [MostPopularRead], try to optimize the query. Quite often an ineffective query is the root cause of a performance problem, and query optimization is much more straightforward than other performance improvement techniques.
If the query plan is parallel and the query is often called by multiple clients simultaneously, forcing a single-thread plan with MAXDOP=1 hint might help (abundant use of parallel plans is usually indicated by SOS_SCHEDULER_YIELD and CXPACKET waits).
Also, have a look at this paper: Performance tuning with wait statistics. It gives a pretty good summary of different wait types and their impact on performance.
P.S. It is easier to use SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED before a query instead of adding (nolock) to each table.
Remove the NOLOCK hint.
Open a query in SSMS, run SET STATISTICSIO ON and run the query in the procedure. Let it finish and post here the IO stats messages. Then post the table definitions and all indexes defined on them. Then somebody will be able to reply with the proper indexes you need.
As with all SQL performance problem, the text of the query is largely irrelevant without complete schema definition.
A guesstimate covering index would be:
create index ContentHitCreatedDate
on ContentHit (CreatedDate)
include (HitCount, ContentId, HitWeightId);
Update
XE_DISPATCHER_WAIT, ONDEMAND_TASK_QUEUE, BROKER_TRANSMITTER, KSOURCE_WAKEUP and BROKER_EVENTHANDLER: you can safely ignore all these waits. They show up because they represent threads parked and waiting to dispatch XEvents, Service Broker or internal SQL thread pool work items. As they spend most of their time parked and waiting, they get accounted for unrealistic wait times. Ignore them.
If you believe ContentHit to be the source of your problem, you could add a Covering Index
CREATE INDEX IX_CONTENTHIT_CONTENTID_HITWEIGHTID_HITCOUNT
ON dbo.ContentHit (ContentID, HitWeightID, HitCount)
Take a look at the Query Plan if you want to be certain about the bottleneck in your query.
By default settings sql server uses all the core/cpu for all queries (max DoP setting> advanced property, DoP= Degree of Parallelism), which can lead to 100% CPU even if only one core is actually waiting for some I/O.
If you search the net or this site you will find resource explaining it better than me (like monitoring your I/o despite you see a CPU-bound problem).
On one server we couldn't change the application with a bad query that locked down all resources (CPU) but by setting DoP to the half of the number of core we managed to avoid that the server get "stopped". The effect on the queries being less parallel was negligible in our case.
--
Dom
Thanks to all who posted, I got some great SQL Server perf tuning tips.
In the end we ran out time to resolve this mystery - we found a more effecient way to collect this information and cache it in the database, so this solved the problem for us.

Resources