Performance issue with Snowflake views - snowflake-cloud-data-platform

I have two different instances of Snowflake. In first instance, I have a materialized view created, with brings data by joining fact table and 3-4 dimension tables.
Create View V1 as
Select Col1, Col2, Col3, Col4
From Fact_t1
inner join Dim1
inner join Dim2
inner join Dim3
On top of this, I have created a secure view. This secure view I share with second instance for reporting and BI purposes.
Now, when we are querying the secure view from second instance we are facing lot of performance issues with query running over 30 mins.
We are running this on Small or Medium based of requirement. We tried to change the size to XL, we gained some performance but it seems there is still a lag.
Following are the details of Snowflake DB:
Size: Small/Medium
Table Volume: 1.4 M (in Fact), 100K+ in Dimensions
Can you please suggest what else can be done? Is this happening because we are creating secure view over a view?

Snowflake doesn't support multi-table join in materialized view. In first instance you are not creating a materialized view rather a view. Every-time a projection on view is done that data is retrieved from the underlying tables in remote storage unless the data is in cache and hasn't been changed in last 24 hours.
Since join of 100+ x 1.4M data is happening, which will bring millions of rows from remote storage. Join operation will take place on those million records thus the processing will take a considerable amount of time.
To improve performance, create a temp table instead and refer that in the secure view to get the best performance.

Related

Benefits to views in stored procedures

I've tried searching in different ways, but haven't found a clear answer to my question. This question almost answers my query, but not quite.
Besides the obvious readability differences, are there any benefits to using a view in a stored procedure:
SELECT
*
FROM
view1
WHERE
view1.fdate > #start AND
view1.fdate <= #end
...over using a linked table list?
SELECT
*
FROM
table1
INNER JOIN
table2
ON table1.pid = table2.fid
INNER JOIN
table3
ON table1.pid = table3.fid
WHERE
table1.fdate > #start AND
table1.fdate <= #end
Is not all about your app and you.
Think enterprise databases, with tens of different apps accessing the same data, and hundreds of individuals querying the data for business purposes. How do you explain to each one of the many individual how to recompose your highly normalized data? Which lookup field maps to which table? How are they joined? And how to you grant read only access to the data, making sure some sensitive fields are not accessible, w/o repeating yourself?
You, the DBA, create VIEWs. They denormalize the data into easy to process relations for the business people, for the many apps, and for the reports. You grant select permission on the views w/o granting access to the underlying table to hide sensitive private fields. And sometime you write views because you're tired of being called at midnight because the database is 'down' cause Johnny from accounting is running a cartesian join.
There are no difference. Query plans will be identical in both cases. Query optimizer can use indexed view even if you don't use it explicitly (in case 2)

DB Performance - Left outer join over database funtion

This is a bit complex query which has multiple joins and reruns a lot of records with several data fields. Let’s say it basically use to retrieve manager details.
First set of tables (already implemented query):
Select m.name, d.name, d.address, m.salary , m.age,……
From manager m,department d,…..etc
JOINS …..
Assume, a one manger can have zero or more employees.
Let’s say I need to list down all employee names for each and every manager for result of first set of tables with managers who has no employees (which means want to keep the manager list of first set of tables as it is).
Then I have to access “employee” table through “party” tables (might be involved few more tables).
Second set of tables (to be newly connected):
That means there are one or more join with “employee” , “party” and …..etc
I have two approaches on this.
Make left outer join with first set of tables to second set of
tables.
Create a user define function (UDF) in DB level for second set of
tables. Then I have to insert manger id in to this UDF as a
parameter and take all the employees (e1,e2,…) as a formatted string
by calling through the select clause in the first set of tables
Please can someone suggest me the best solution in DB performance wise out of these two options?
Go for the JOIN, using appropriate WHERE clauses and indexes.
The database engine is far better at optimizing that you'll ever be. Let it do its job.
Your way sounds like (n+1) query death.
Write a sample query and ask your database to EXPLAIN PLAN to see what the cost is. If you spot a TABLE

Statistical view vs Materialized view

Statistical view:
This view collects the statistics about a table, like the number of records, maximum and minimum value of primary key. This helps in quick fetch of the data for an SQL query.
Materialized View:
This view is an ordinary view, just depicting the abstract data of the table as per the query the view is formed.
Now how and when we have to use this views in an application? How it can be handy in DBA point of view?
The two are not really related at all.
Stats are collected ( or should be ) as part of everyday operation, they tell the query optimizer about the database, number of rows, distribution of values, etc. This helps the optimizer decide what is going to be the best query plan to use to get to the data "on disk".
A Materialized View is similar to a normal view ( eg, a "saved" query ) , however the results are stored, rather than the build query having to be re executed each time its called. There is various options for how to refresh the MV, on demand, each time the base tables are updated, etc.
Materialized views are often used for expensive queries where the results can be somewhat out of date. For example, if you had a table containing each sale made, you might create a MV which contains the total sales for each previous month.

Joining against views in SQLServer with strange query optimizer behavior

I have a complex view that I use to pull a list of primary keys that indicate rows in a table that have been modified between two temporal points.
This view has to query 13 related tables and look at a changelog table to determine if a entity is "dirty" or not.
Even with all of this going on, doing a simple query:
select * from vwDirtyEntities;
Takes only 2 seconds.
However, if I change it to
select
e.Name
from
Entities e
inner join vwDirtyEntities de
on e.Entity_ID = de.Entity_ID
This takes 1.5 minutes.
However, if I do this:
declare #dirtyEntities table
(
Entity_id uniqueidentifier;
)
insert into #dirtyEntities
select * from vwDirtyEntities;
select
e.Name
from
Entities e
inner join #dirtyEntities de
on e.Entity_ID = de.Entity_ID
I get the same results in only 2 seconds.
This leads me to believe that SQLServer is evaluating the view per row when joined to Entities, instead of constructing a query plan that involves joining the single inner join above to the other joins in the view.
Note that I want to join against the full result set from this view, as it filters out only the keys I want internally.
I know I could make it into a materialized view, but this would involve schema binding the view and it's dependencies and I don't like the overhead maintaining the index would cause (This view is only queried for exports, while there are far more writes to the underlying tables).
So, aside from using a table variable to cache the view results, is there any way to tell SQL Server to cache the view while evaluating the join? I tried changing the join order (Select from the view and join against Entities), however that did not make any difference.
The view itself is also very efficient, and there is no room to optimize there.
There is nothing magical about a view. It's a macro that expands. The optimiser decides when JOINed to expand the view into the main query.
I'll address other points in your post:
you have ruled out an indexed view. A view can only be a discrete entity when it is indexed
SQL Server will never do a RBAR query on it's own. Only developers can write loops.
there is no concept of caching: every query uses latest data unless you use temp tables
you insist on using the view which you've decided is very efficient. But have no idea how views are treated by the optimizer and it has 13 tables
SQL is declarative: join order usually does not matter
Many serious DB developer don't use views because of limitations like this: they are not reusable because they are macros
Edit, another possibility. Predicate pushing on SQL Server 2005. That is, SQL Server can not push the JOIN condition "deeper" into the view.

In SQL Server, when should I use an indexed view instead of a real table?

I know in SQL Server you can create indexes on a view, then the view saves the data from the underlying table. Then you can query the view. But, why I need to use view instead of table?
You may want to use a view to simplify on queries. In our projects, the consensus is on using views for interfaces, and especially "report interfaces".
Imagine you've got a client table, and the manager wants a report every morning with the client's name, and their account balance (or whatever). If you code your report against the table, you're creating a strong link between your report and your table, making later changes difficult.
On the other hand if your report hits a view, you can twist the database freely; as long as the view is the same the report works, the manager is happy and you're free to experiment with the database. You want to separate client metadata from the main client table? go for it, and join the two tables in the view. You want to denormalize the cart info for the client? no problem, the view can adapt...
To be honest, it's my view as a programmer but other uses will certainly be found by db gurus :)
One advantage of using an indexed view is for ordering results of 2 or more columns, where the columns are in different tables. ie, have a view which is the result of table1 and table2 sorted by table1.column1, table2.column2. You could then create an index on column1, column2 to optimise that query
A table is where the data is physically stored.
A view is where tables are summarized or grouped to make groups of tables easier to use.
An indexed view allows a query to use a view, and not need to get data from the underlying table, as the view already has the data, thus increasing performance.
You could not achieve the same result with just tables, without denormalizing your database, and thus potentially creating other issues.
Basically, use a view:
When you use the same complex query on many tables, multiple times.
When new system need to read old table data, but doesn't watch to change their perceived schema.
Indexed Views can improve performance by creating more specific index without increasing redundancy.
A view is simply a SELECT statement that has been given a name and stored in a database. The main advantage of a view is that once it's created, it acts like a table for any other SELECT statements that you want to write.
The select statement for the view can reference tables, other views and functions.
You can create an index on the view (indexed view) to improve performance. An indexed view is self-updating, immediately reflecting changes to the underlying tables.
If your indexed view only selects columns from one table, you could just as well place the index on that table and query that table directly, the view would only cause overhead for your database. However, if your SELECT statement covers multiple tables with joins etc. than you could gain a performance boost by placing an index on the view.

Resources