Change tracking is not supported on queries with QUALIFY - snowflake-cloud-data-platform

I am trying to create a Stream on top of a VIEW. The VIEW joins to other multiple VIEWs that use QUALIFY to get the latest record for each PK from base tables.
Snowflake gives me the following error when I try to create this Stream:
Change tracking is not supported on queries with QUALIFY.
What are my options? Thanks.
Note:
Change tracking is not supported on queries with window functions.

Streams on views with the following operations are not yet supported:
- GROUP BY clauses
- QUALIFY clauses
- Subqueries not in the FROM clause
- Correlated subqueries
- LIMIT clauses
You can read more about this here.

Related

MAX on a VIEW doing a TableScan instead of using metadata lookup. Why doesn't SF use metadata?

I noticed even the simplest 'SELECT MAX(TIMESTAMP) FROM MYVIEW' is somewhat slow (taking minutes) in my environment, and found it's doing a TableScan of 100+GB across 80K micropartitions.
My expectation was this to finish in milliseconds using MIN/MAX/COUNT metadata in each micropartitions. In fact, I do see Snowflake finishing the job in milliseconds using metadata for almost same MIN/MAX value lookup in following article:
http://cloudsqale.com/2019/05/03/performance-of-min-max-functions-metadata-operations-and-partition-pruning-in-snowflake/
Is there any limitation in how Snowflake decides to use metadata? Could it be because I'm querying through a view, instead of querying a table directly?
=== Added for clarity ===
Thanks for answering! Regarding how the view is defined, it seems to adds a WHERE clause for additional filtering with a cluster key. So I believe it should still be possible to fully use metadata of miropartitions. But as posted, TableScan is being done in profilter output.
I'm bit concerned on your comment on SecureView. The view I'm querying is indeed a SECURE VIEW - does it affect how optimizer handles my query? Could that be a reason why TableScan is done?
It looks like you're running the query on a view. The metadata you're referring to will be used when you're running a simple MIN MAX etc on the table, however if you have some logic inside your view which requires filtering / joining of data then Snowflake cannot return results just based off the metadata.
So yes, you are correct when you say the following because your view is probably doing something other than a simple MAX on the table:
...Could it be because I'm querying through a view, instead of querying a table directly?

Does SQL Server expand a view's sql inline during execution?

Let's say I have a (hypothetical) table called Table1 with 500 columns and there is a view called View1 which is basically
select Column1, Column2,..., Column500, ComputedOrForeignKeyColumn1,...
from Table1
inner join ForeignKeyTables .....
Now, when I execute something like
Select Column32, Column56
from View1
which one of the below 3 does SQL Server turn it into?
Query #1:
select Column32, Column56
from
(select
Column1, Column2,..., Column500, ComputedOrForeignKeyColumn1,...
from
Table1
inner join
ForeignKeyTables ......) v
Query #2:
Select Column32, Column56
from Table1
Query #3:
select Column32, Column56
from
(select Column32, Column56
from Table1) v
The reason I'm asking this is that I do have a very wide table and a view sitting on top of it (that basically inner joins to bring texts from all foreign key ids) and I can't figure out if SQL Server fetches all columns and then selects the ones that are needed or fetches only those that are needed (while also ignoring unnecessary joins etc)...if it is former then a view would not be the best for performance.
SQL Server query compilation can be split into phases:
Parsing
Binding
Optimization
View resolution is performed during binding. At this stage the view reference is replaced with its definition. At this point, unused view columns will be present.
The next stage is optimization, where the bound syntax tree is transformed into an execution plan. The optimizer considers many kinds of manipulations on the execution plan to increase efficiency, and removing unused columns is one of the most basic. At this point, the unused column references will be removed.
So to answer your question, unused columns in the view definition will not impact performance, since the optimizer will be smart enough to remove them.
Note: this answer assumes the view is not indexed. For indexed views, the resolution process works differently, and there is view maintenance overhead for UPDATEs of the base tables.
None of the above. SQL Server will parse the query and it will create and execution plan. The resulting execution plan is calculated based on many factors, like indexes joins, etc.
Your question cannot be truly answered by anyone other than you, examining such execution plan.
See How do I obtain a Query Execution Plan? for more information.
The view definition is merged with the outer query in very early stage of compilation. You may or may not get the same execution plan for query on a view vs an equivalent query touching base tables, depending on complexity of the view and given the limitations of QO.
For your particular case it's worth noting that an inner join doesn't only fetch data from joined tables, but it also limits the result (in the same way as an IF EXISTS check does). If there is a declarative FK between the tables, the QO will be smart enough not to check the referenced tables, as the existence is guaranteed by the constraint, but otherwise it has to.

Optimization problems with View using Clustered Index Insert on tempdb on SQL Server 2008

I am creating a Java function that needs to use a SQL query with a lot of joins before doing a full scan of its result. Instead of hard-coding a lot of joins I decided to create a view with this complex query. Then the Java function just uses the following query to get this result:
SELECT * FROM VW_####
So the program is working fine but I want to make it faster since this SELECT command is taking a lot of time. After taking a look on its plan execution plan I created some indexes and made it +-30% faster but I want to make it faster.
The problem is that every operation in the execution plan have cost between 0% and 4% except one operation, a clustered-index insert that has +-50% of the execution cost. I think that the system is using a temporary table to store the view's data, but an index in this view isn't useful for me because I need all rows from it.
So what can I do to optimize that insert in the CWT_PrimaryKey? I think that I can't turn off that index because it seems to be part of the SQL Server's internals. I read somewhere that this operation could appear when you use cursors but I think that I am not using (or does the view use it?).
The command to create the view is something simple (no T-SQL, no OPTION, etc) like:
create view VW_#### as SELECTS AND JOINS HERE
And here is a picture of the problematic part from the execution plan: http://imgur.com/PO0ZnBU
EDIT: More details:
Well the query to create the problematic view is a big query that join a lot of tables. Based on a single parameter the Java-Client modifies the query string before creating it. This view represents a "data unit" from a legacy Database migrated to the SQLServer that didn't had any Foreign or Primary Key, so our team choose to follow this strategy. Because of that the view have more than 50 columns and it is made from the join of other seven views.
Main view's query (with a lot of Portuguese words): http://pastebin.com/Jh5vQxzA
The other views (from VW_Sintese1 until VW_Sintese7) are created like this one but without using extra views, they just use joins with the tables that contain the data requested by the main view.
Then the Java Client create a prepared Statement with the query "Select * from VW_Sintese####" and execute it using the function "ExecuteQuery", something like:
String query = "Select * from VW_Sintese####";
PreparedStatement ps = myConn.prepareStatement(query,ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_READ_ONLY);
ResultSet rs = ps.executeQuery();
And then the program goes on until the end.
Thanks for the attention.
First: you should post the code of the view along with whatever is using the views because of the rest of this answer.
Second: the definition of a view in SQL Server is later used to substitute in querying. In other words, you created a view, but since (I'm assuming) it isn't an indexed view, it is the same as writing the original, long SELECT statement. SQL Server kind of just swaps it out in the DML statement.
From Microsoft's 'Querying Microsoft SQL Server 2012': T-SQL supports the following table expressions: derived tables, common table expressions (CTEs), views, inline table-valued functions.
And a direct quote:
It’s important to note that, from a performance standpoint, when SQL Server optimizes
queries involving table expressions, it first unnests the table expression’s logic, and therefore interacts with the underlying tables directly. It does not somehow persist the table expression’s result in an internal work table and then interact with that work table. This means that table expressions don’t have a performance side to them—neither good nor
bad—just no side.
This is a long way of reinforcing the first statement: please include the SQL code in the view and what you're actually using as the SELECT statement. Otherwise, we can't help much :) Cheers!
Edit: Okay, so you've created a view (no performance gain there) that does 4-5 LEFT JOIN on to the main view (again, you're not helping yourself out much here by eliminating rows, etc.). If there are search arguments you can use to filter down the resultset to fewer rows, you should have those in here. And lastly, you're ordering all of this at the top, so your query engine will have to get those views, join them up to a massive SELECT statement, figure out the correct order, and (I'm guessing here) the result count is HUGE and SQL's db engine is ordering it in some kind of temporary table.
The short answer: get less data (fewer columns and only the rows you need); don't order the results if the resultset is very large, just get the data to the client and then sort it there.
Again, if you want more help, you'll need to post table schemas and index strategies for all tables that are in the query (including the views that are joined) and you'll need to include all view definitions (including the views that are joined).

Creating readonly views in Sql Server

According to MSDN, views composed of simple selects automatically allow you to use insert/update/delete statements on the table. Is there a way to prevent this - to tell Sql Server that the view is readonly, and you can't use it to modify the table?
The best way would be to remove UPDATE/DELETE/INSERT permissions on the View.
Apart from that you could create an INSTEAD OF trigger on the view that simply does nothing to have the updates silently fail or there are quite a few constructs that make views non updatable. So you can pick one that doesn't change semantics or efficiency and then violate it.
Edit: The below seems to fit the bill.
CREATE VIEW Bar
AS
SELECT TOP 100 PERCENT x
FROM foo
WITH CHECK OPTION
You could specify an UNION operator in order to make SQL Server fail during the INSERT/UPDATE/DELETE operation, like this:
create view SampleView
as
select ID, value from table
union all
select 0, '0' where 1=0
The last query doesn't return any rows at all, but must have the same amount of fields with the same data types as the first query, in order to use the UNION safely. See this link for more info: Different ways to make a table read only in a SQL Server database
The best way to handle this is to allow select only access to views. Or Deny Insert/Update/Delete access to given users. Works perfectly. Also create views "WITH (NOLOCK)".

How do I filter one of the columns in a SQL Server SQL Query

I have a table (that relates to a number of other tables) where I would like to filter ONE of the columns (RequesterID) - that column will be a combobox where only people that are not sales people should be selectable.
Here is the "unfiltered" query, lets call it QUERY 1:
SELECT RequestsID, RequesterID, ProductsID
FROM dbo.Requests
If using a separate query, lets call it QUERY 2, to filter RequesterID (which is a People related column, connected to People.PeopleID), it would look like this:
SELECT People.PeopleID
FROM People INNER JOIN
Roles ON People.RolesID = Roles.RolesID INNER JOIN
Requests ON People.PeopleID = Requests.RequesterID
WHERE (Roles.Role <> N'SalesGuy')
ORDER BY Requests.RequestsID
Now, is there a way of "merging" the QUERY 2 into QUERY 1?
(dbo.Requests in QUERY 1 has RequesterID populated as a Foreign Key from dbo.People, so no problem there... The connections are all right, just not know how to write the SQL query!)
UPDATE
Trying to explain what I mean in a bit more... :
The result set should be a number of REQUESTS - and the number of REQUESTS should not be limited by QUERY 2. QUERY 2:s only function is to limit the selectable subset in column Requests.RequesterID - and no, it´s not that clear, but in the C# VS2008 implementation I use Requests.RequesterID to eventually populate a ComboBox with [Full name], which is another column in the People table - and in that column I don´t want SalesGuy to show up as possible to select; here I´m trying to clear it out EVEN MORE... (but with wrong syntax, of course)
SELECT RequestsID, (RequesterID WHERE RequesterID != 8), ProductsID
FROM dbo.Requests
Yes, RequesterID 8 happens to be the SalesGuy :-)
here is a very comprehensive article on how to handle this topic:
Dynamic Search Conditions in T-SQL by Erland Sommarskog
it covers all the issues and methods of trying to write queries with multiple optional search conditions. This main thing you need to be concerned with is not the duplication of code, but the use of an index. If your query fails to use an index, it will preform poorly. There are several techniques that can be used, which may or may not allow an index to be used.
here is the table of contents:
Introduction
The Case Study: Searching Orders
The Northgale Database
Dynamic SQL
Introduction
Using sp_executesql
Using the CLR
Using EXEC()
When Caching Is Not Really What You Want
Static SQL
Introduction
x = #x OR #x IS NULL
Using IF statements
Umachandar's Bag of Tricks
Using Temp Tables
x = #x AND #x IS NOT NULL
Handling Complex Conditions
Hybrid Solutions – Using both Static and Dynamic SQL
Using Views
Using Inline Table Functions
Conclusion
Feedback and Acknowledgements
Revision History
if you are on the proper version of SQL Server 2008, there is an additional technique that can be used, see: Dynamic Search Conditions in T-SQL Version for SQL 2008 (SP1 CU5 and later)
If you are on that proper release of SQL Server 2008, you can just add OPTION (RECOMPILE) to the query and the local variable's value at run time is used for the optimizations.
Consider this, OPTION (RECOMPILE) will take this code (where no index can be used with this mess of ORs):
WHERE
(#search1 IS NULL or Column1=#Search1)
AND (#search2 IS NULL or Column2=#Search2)
AND (#search3 IS NULL or Column3=#Search3)
and optimize it at run time to be (provided that only #Search2 was passed in with a value):
WHERE
Column2=#Search2
and an index can be used (if you have one defined on Column2)
How about this? Since the query already joins on the requests table you can simply add the columns to the select-list like so :
SELECT Requests.RequestsID, Requests.RequesterID, Requests.ProductsID
FROM People INNER JOIN
Roles ON People.RolesID = Roles.RolesID INNER JOIN
Requests ON People.PeopleID = Requests.RequesterID
WHERE (Roles.Role <> N'SalesGuy')
ORDER BY Requests.RequestsID
You can in fact select any column from any of the joined tables (Roles, Requests, People, etc.)
It becomes clear if you just replace People.PeopleId with * and it will show you everything retrieved from the tables.

Resources