How to access snowflake query profile overview statistics via SQL?

How to access snowflake query profile overview statistics via SQL? - snowflake-cloud-data-platform

In Snowflake SnowSight UI, in the Query Profile view, there is a section called Profile Overview where you can see the breakdown of the total execution time. It contains statistics like Processing, Local Disk I/O, Remote Disk I/O, Synchronization etc.
Full list here
https://docs.snowflake.com/en/user-guide/ui-snowsight-activity.html#profile-overview
I want to access those statistics programmatically instead of having to navigate to that section for each query that I want to analyze. The only system view I know that provides query statistics is the QUERY_HISTORY however it doesn't contain those stats.
https://docs.snowflake.com/en/sql-reference/account-usage/query_history.html
Question is, can I get those stats in any of the system views? If so, where and how?

It is possible to programmatically access query profile using GET_QUERY_OPERATOR_STATS
Returns statistics about individual query operators within a query. You can run this function for any query that was executed in the past 14 days.
For example, you can use this information to determine which operators are consuming the most resources. As another example, you can use this function to identify joins that have more output rows than input rows, which can be a sign of an “exploding” join (e.g. an unintended Cartesian product).
These statistics are also available in the query profile tab in Snowsight. The GET_QUERY_OPERATOR_STATS() function makes the same information available via a programmatic interface.
The GET_QUERY_OPERATOR_STATS function is a table function. It returns rows with statistics about each query operator in the query
set query_id = '<query_ud>';
select *
from table(get_query_operator_stats($query_id));

2023 update: GET_QUERY_OPERATOR_STATS()
See https://stackoverflow.com/a/74824120/132438 with Lukasz answer.
https://docs.snowflake.com/en/sql-reference/functions/get_query_operator_stats.html
Bad news: There's no programmatic way to get this.
Good news: This is a frequent request, so we might eventually have news.
In the internal tracker I left a note to update this answer once there is progress we can report.

You can do it via https://github.com/Snowflake-Labs/sfsnowsightextensions#get-sfqueryprofile. Doing it at scale (scraping-style) will likely yield ~60%-80% success rate. Please don't abuse it.
Inspired by a clever customer who did that to get what is now offered by https://docs.snowflake.com/en/sql-reference/account-usage/access_history.html
Completely unsupported as it says so on the repo homepage.

Just FYI, there is an upcoming feature called GET_QUERY_STATS (currently in private preview) https://docs.snowflake.com/en/LIMITEDACCESS/get_query_stats.html that will do just this and obviate the reason for Get-SFQueryProfile once it ships.

Related

Saleforce SOQL query - Jersey Readtimeout error

I'm having a problem on a batch job that has a simple SOQL query that returns a lot of records. More than a million.
The query, as it is, cannot be optimized much further according to SOQL best practices. (At least, as far as I know. I'm not an SF SOQL expert.)
The problem is that I'm getting -
Caused by: javax.ws.rs.ProcessingException: java.net.SocketTimeoutException: Read timed out
I try bumping up the Jersey readtime out value from 30 seconds to 60 seconds, but it still times out.
Any recommendation on how to deal with this issue? Any recommended value for the readtimeout parameter for a query that returns that much data?
The query is like this:
SELECT Id, field1, field2__c, field3__c, field3__c FROM Object__c
WHERE field2__c = true AND (not field3 like '\u0025Some string\u0025')
ORDER BY field4__c ASC

In no specific order...
Batches written in Apex time out after 2 minutes so maybe set same in your Java application
Run your query in Developer Console using the query plan feature (you probably will have to put real % in there, not \u0025). Pay attention which part has "Cost" column > 1.
what are field types? Plain checkbox and text or some complex formulas?
Is that text static or changes depending on what your app needs? would you consider filtering out the string in your code rather than SOQL? Counter-intuitive to return more records than you really need but well, might be an option.
would you consider making a formula field with either whole logic or just the string search and then asking SF to index the formula. Or maybe making another field (another checkbox?) with "yes, it contains that text" info, set the value by workflow maybe (essentially prepare your data a bit to efficiently query it later)
read up about skinny tables and see if it's something that could work for you (needs SF support)
can you make an analytic snapshot of your data (make a report, make SF save results to helper object, query that object)? Even if it'd just contain lookups to your original source so you'll access always fresh values it could help. Might be a storage killer though
have you considered "big objects" and async soql
I'm not proud of it but in the past I had some success badgering the SF database. Not via API but if I had a nightly batch job that was timing out I kept resubmitting it and eventually 3rd-5th time it managed to start. Something in the query optimizer, creation of cursor in underlying Oracle database, caching partial results... I don't know.
what's in the ORDER BY? Some date field? If you need records updated since X first then maybe replication API could help getting ids first.
does it make sense to use LIMIT 200 for example? Which API you're using, SOAP or REST? Might be that returning smaller chunks (SOAP: batch size, REST API: special header) would help it finish faster.
when all else fails (but do contact SF support, make sure you exhausted the options) maybe restructure the whole thing. Make SF push data to you whenever it changes, not pull. There's "Streaming API" (CometD implementation, Bayeux protocol, however these are called) and "Change Data Capture" and "Platform Events" for nice event bus-driven architecture decisions, replaying old events up to 3 days back if the client was down and couldn't listen... But that's a totally different topic.

Find out the popularity of values

I have a table with 1000 rows. Each row represents a prompt text for an application. For the start I only want to translate the most used 20% of the promts. In the daily use some dialogs appear more often than others. So the prompt texts for the most displayed dialogs get fetched more often than the others.
However, it looks to me like there is no built-in mechanism to analyse the data by their select rates.
There are no triggers on select. There is no way to filter the data in the profiler. There is no way to filter data in an Audit. Is that true?
Are there any options to do that inside the SQL Server?

No. There is no way to track the frequency of how often data is selected.
This sounds like application metrics. You will have to write metrics logic yourself.
For example, you might create a table of MissingTranslations that tracks the frequency of requests. If your application detects a missing translation, insert a row into this table with a frequency of 1, or increment the counter if it already exists in the table.
You could then write another application that sorts the missing translations by frequency descending. When a user enters the translation, the translation app removes the entry from the list of missing translations or marks it as complete.
All that being said, you could abuse some SQL Server features to get some information. For example, a stored procedure that returns these translations could generate a user-configurable trace event with the translation info. A SQL Profiler session could listen for these events and write them to a table. This would get you a basic frequency.
It might be possible to get the same information from implementing auditing and then calling sys.fn_get_audit_file, but that sounds cumbersome at best.
In my opinion, it sounds easier and more stable to me to write this logic yourself.

#TabAlleman: "no, there's nothing you can do"

Does solr store recent queries?

For example I fired queries-
q=id:SOURCE-*
q=sourceName:abc
q=sourceName:xyz
q=id:DB-*
Is there any way to fetch these last 4 queries fired on Solr?

Solr does has a query cache that holds the previous queries and the docs ids with the results. Your main issue would be how to use it, as it is mostly for internal use. But you can look into the source code and maybe you find a way.

One idea might be to use the Solr logging system. You can set the log level to INFO and it should be fine to retrieve every queries.
In addition to the logging options [...], there is a way to
configure which request parameters (such as parameters sent as part of
queries) are logged with an additional request parameter called
logParamsList. See the section on Common Query Parameters for more
information.
For example with logParamsList=q, only the q parameters will be logged.
N.B. Logging every query can potentially impact performance depending on the query rate and the volume of data generated.

Nhibernate Paging performance on large table (10,000,000 rows)

I have a table that is rather large at about 10,000,000 rows. I need to page through this table from my C# application. I'm using NHibernate. I have tried to use this code example:
return session.CreateCriteria(typeof(T))
.SetFirstResult(startId)
.SetMaxResults(pageSize)
.List<T>();
When I execute it the operation eventually times out if my startId is greater than 7,000,000. The pageSize I'm using is 200. I have used this method on much smaller tables, of less than 1000 rows, and it works and performs quickly.
Question is, on such a large table is there a better way to accomplish this using NHibernate?

You're trying to page through 10 million rows 200 at a time? Why? No human being is going to page through that much data.
You need to filter the dataset first and then apply TSQL style paging to the smaller data set. Here are some methods that will work. Just modify them so that you're getting to less than 10million rows through some kind of filtering (a WHERE clause, CTE, or derived table).

Funny you should bring this up, as I am having the same issue. My issue isn't related to paging using NHibernate, but more with just using straight T-SQL.
It seems as though there are a few options. The one I found quite useful in my instance was this answer to a question regarding paging. It discusses using a "..keyset driven solution" rather than return ranked results through the use of ROW_NUMBER(). I'm not sure what NHibernate would use in this instance or if it's possible to see the SQL it generates based on the query you issue (I know you could in Hibernate, but I've not used NHibernate).
If you aren't aware of the using SQL SERVER to returned ranked results based on ROW_NUMBER, then it's well worth looking into. A lot of people seem to refer to this article as to how to go about paging. I've seen some subsequent posts discourage the use of SET ROWCOUNT though in favour of using TOP with a dynamic parameter - SELECT TOP(#NumOfResults).
There are lots of posts here on SO regarding this, but no definitive answer on the best way to go about it as far as I can see. I'll be keeping an eye on this post to see what others suggest also.

It could by Isolation Layer problem.
I had a similar issues.
If the table your reading from is constantly updated, the updater locks parts of the table, causing timeout then reading from the table.
Add SetIsolationLayer(ReadUncommitted) you must note that the data might be a little dirty.

Paging of Query Result Set

Greetings Overflowers,
I'm wondering if there is a way to query some kind of a database and only fetch a certain window in the full result set without having to actually go through them all.
For example, if I query my database and I want only results number 100 to 200, would the database fetch all the results (say 0 to 1000) that match my query and later on filter them to exclude any thing outside my specified window frame ?
Actually, I'm working on a full text search problem (not really relational db stuff).
So how about Google and other search engines, do they get full result then filter or do they have direct access to only the needed window frame ?
Thank you all !

Your question is probably best answered in two parts.
For a database (traditional, relational), a query that is executed contains a number of "where" clauses, which will cause the database engine to limit the number of results that it returns. So if you specify a where clause that basically limits between 2 values of the primary key,
select * From table where id>99 and id<201;
you'll get what you're asking for.
For a search engine, a query you make to get the results will always paginate - using various techniques, all the results will be pre-split into pages and a few will be cached. Other pages will be generated on demand. So if you want pages 100-200 then you only ever fetch those that are needed.
The option to filter is not very efficient because large data sources never want to load all their data into memory and slice - you only want to load what's needed.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight