Snowflake won't use materialized view if I join tables

Snowflake won't use materialized view if I join tables - snowflake-cloud-data-platform

I'm trying to join my fact table to my dim table. A Materialized view has been created on my fact table to help with performance when getting the sum of totals. However, I'm seeing that my MV isn't being used in example #1. The only time it works is if I created an aggregated sub-query based on examples #2
The examples below use data from Snowflake's sample data.
Do I always have to write my query like example #2 to make use of it?
--creating the MV
create or replace materialized view my_db.public.inventory_mv as
(select inv_item_sk,sum(INV_QUANTITY_ON_HAND) as INV_QUANTITY_ON_HAND from "SNOWFLAKE_SAMPLE_DATA"."TPCDS_SF10TCL"."INVENTORY" group by 1)
--Example #1 - My MV does not get used according to the query plan
select
b.I_PRODUCT_NAME
,sum(a.INV_QUANTITY_ON_HAND) INV_QUANTITY_ON_HAND
from "SNOWFLAKE_SAMPLE_DATA"."TPCDS_SF10TCL"."INVENTORY" a
join "SNOWFLAKE_SAMPLE_DATA"."TPCDS_SF10TCL"."ITEM" b on a.inv_item_sk = b.i_item_sk
group by 1
--Example #2 - The query planner indicates MV is used
select
b.I_PRODUCT_NAME
,sum(a.INV_QUANTITY_ON_HAND) INV_QUANTITY_ON_HAND
from (select inv_item_sk,sum(INV_QUANTITY_ON_HAND) as INV_QUANTITY_ON_HAND from "SNOWFLAKE_SAMPLE_DATA"."TPCDS_SF10TCL"."INVENTORY" group by 1) a
join "SNOWFLAKE_SAMPLE_DATA"."TPCDS_SF10TCL"."ITEM" b on a.inv_item_sk = b.i_item_sk
group by 1

Even if a materialized view can replace the base table in a particular query, the optimizer might not use the materialized view. For example, if the base table is clustered by a field, the optimizer might choose to scan the base table (rather than the materialized view) because the optimizer can effectively prune out partitions and provide equivalent performance using the base table.
https://docs.snowflake.com/en/user-guide/views-materialized.html#how-the-query-optimizer-uses-materialized-views

It's not using the MV because the MV and query are grouping by a different column.
--creating the MV
create or replace materialized view my_db.public.inventory_mv as
(select inv_item_sk ... group by 1)
The MV definition is grouping by inv_item_sk.
--Example #1 - My MV does not get used according to the query plan
select
b.I_PRODUCT_NAME ...
group by 1
The query is grouping by I_PRODUCT_NAME.
Since the MV and query are grouping by different columns, the optimizer will not use the MV. In the second example the MV is used in the FROM clause, so it has to be used.

Related

Why isn't snowflake using my materialized view

Why is that when I query my base table with the following aggregate query snowflake doesn't reference my MV?
create or replace table customer_sample as (
SELECT * FROM
"SNOWFLAKE_SAMPLE_DATA"."TPCDS_SF100TCL"."CUSTOMER");
create or replace materialized view customer_sample_mv
as
select c_customer_sk,
sum(c_current_hdemo_sk) total_sum
from customer_sample
group by 1;
select c_customer_sk,
sum(c_current_hdemo_sk) total_sum
from customer_sample
group by 1;
Query Profile

There are lots of possible reasons e.g.
The MV was still being built when you executed the query
Snowflake determined it was quicker to execute the query without using the MV
The user running the query didn’t have the required privileges on the MV
etc.

In this example Snowflake is doing the right thing by skipping the materialized view.
First surprise: Scanning the materialized view is slower than just re-running the query:
select *
from customer_sample_mv
order by total_sum desc nulls last
limit 100;
-- 4.4s
vs
select *
from (
select c_customer_sk,
sum(c_current_hdemo_sk) total_sum
from customer_sample
group by 1
)
order by total_sum desc nulls last
limit 100;
-- 3.6s
So Snowflake is saving time by not choosing the materialized view.
How is this possible?
Well, turns out there are no repeated customer ids. So pre-grouping them does nothing.
select c_customer_sk, count(*) c
from customer_sample
group by 1
having c>1
order by 2 desc
limit 10;
-- null
From the docs:
Even if a materialized view can replace the base table in a particular query, the optimizer might not use the materialized view. For example, if the base table is clustered by a field, the optimizer might choose to scan the base table (rather than the materialized view) because the optimizer can effectively prune out partitions and provide equivalent performance using the base table.
https://docs.snowflake.com/en/user-guide/views-materialized.html#how-the-query-optimizer-uses-materialized-views

Snowflake - Dynamic Filter Condition inside CTE (Common Table Expression)

I am creating view in Snowflake that has CTE on base table without any filters. I have other CTEs that depend on Parent CTE to fetch further information.
Everything is working fine when I query all records from base table that has 45K rows. But when I query view for one particular ID, explain plan shows Base CTE is picking up 45K rows, joining rest of CTE on 45K rows then finally applying my unique ID filter and returning one row.
I am not getting any difference in performance pulling data for all records or one record. Snowflake is not optimizing base CTE to apply the filter criteria I am looking for.
Any suggestions how can I resolve this issue? I used local variables in filter criteria of base CTE but it is not viable solution.
CREATE OR REPLACE VIEW test_v AS
WITH parent_cte as
(select document_id, time, ...
from audit_table
),
emp_cte as
(select employee_details, ...
from employee_tab,
parent_cte
where parent_cte.document_id = employee_tab.document_id),
dep_cte as
(select dep_details, ....
from dependent_tab,
emp_cte
where ..........)
select *
from dep_cte, emp_cte, base_cte;
Now when I query the view for one document_id, plan is fetching all data and joining then applying filter which is not efficient.
select * from test_v where document_id = '1001';
I can't use these tables in one select with join condition as I am using "LATERAL FLATTEN" which is cross multiplying each base table record so I am going with CTE approach.
Appreciate your ideas.

Use result of stored procedure to join to a table

I have a stored procedure that returns a dataset from a dynamic pivot query (meaning the pivot columns aren't know until run-time because they are driven by data).
The first column in this dataset is a product id. I want to join that product id with another product table that has all sorts of other columns that were created at design time.
So, I have a normal table with a product id column and I have a "dynamic" dataset that also has a product id column that I get from calling a stored procedure. How can I inner join those 2?

Dynamic SQL is very powerfull, but has some severe draw backs. One of them is exactly this: You cannot use its result in ad-hoc-SQL.
The only way to get the result of a SP into a table is, to create a table with a fitting schema and use the INSERT INTO NewTbl EXEC... syntax...
But there are other possibilities:
1) Use SELECT ... INTO ... FROM
Within your SP, when the dynamic SQL is executed, you could add INTO NewTbl to your select:
SELECT Col1, Col2, [...] INTO NewTbl FROM ...
This will create a table with the fitting schema automatically.
You might even hand in the name of the new table as a paramter - as it is dynamic SQL, but in this case it will be more difficult to handle the join outside (must be dynamic again).
If you need your SP to return the result, you just add SELECT * FROM NewTbl. This will return the same resultset as before.
Outside your SP you can join this table as any normal table...
BUT, there is a big BUT - ups - this sounds nasty somehow - This will fail, if the tabel exists...
So you have to drop it first, which can lead into deep troubles, if this is a multi-user application with possible concurrencies.
If not: Use IF EXISTS(SELECT 1 FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME='NewTbl') DROP TABLE NewTbl;
If yes: Create the table with a name you pass in as parameter and do you external query dynamically with this name.
After this you can re-create this table using the SELECT ... INTO syntax...
2) Use XML
One advantage of XML is the fact, that any structure and any amount of data can be stuffed into one single column.
Let your SP return a table with one single XML column. You can - as you know the schema now - create a table and use INSERT INTO XmlTable EXEC ....
Knowing, that there will be a ProductID-element you can extract this value and create a 2-column-derived-table with the ID and the depending XML. This is easy to join.
Using wildcards in XQuery makes it possible to query XML data without knowing all the details...
3) This was my favourite: Don't use dynamic queries...

Creating a fast-refreshing materialized view in Oracle [duplicate]

So I'm pretty sure Oracle supports this, so I have no idea what I'm doing wrong. This code works:
CREATE MATERIALIZED VIEW MV_Test
NOLOGGING
CACHE
BUILD IMMEDIATE
REFRESH FAST ON COMMIT
AS
SELECT V.* FROM TPM_PROJECTVERSION V;
If I add in a JOIN, it breaks:
CREATE MATERIALIZED VIEW MV_Test
NOLOGGING
CACHE
BUILD IMMEDIATE
REFRESH FAST ON COMMIT
AS
SELECT V.*, P.* FROM TPM_PROJECTVERSION V
INNER JOIN TPM_PROJECT P ON P.PROJECTID = V.PROJECTID
Now I get the error:
ORA-12054: cannot set the ON COMMIT refresh attribute for the materialized view
I've created materialized view logs on both TPM_PROJECT and TPM_PROJECTVERSION. TPM_PROJECT has a primary key of PROJECTID and TPM_PROJECTVERSION has a compound primary key of (PROJECTID,VERSIONID). What's the trick to this? I've been digging through Oracle manuals to no avail. Thanks!

To start with, from the Oracle Database Data Warehousing Guide:
Restrictions on Fast Refresh on Materialized Views with Joins Only
...
Rowids of all the tables in the FROM list must appear in the SELECT
list of the query.
This means that your statement will need to look something like this:
CREATE MATERIALIZED VIEW MV_Test
NOLOGGING
CACHE
BUILD IMMEDIATE
REFRESH FAST ON COMMIT
AS
SELECT V.*, P.*, V.ROWID as V_ROWID, P.ROWID as P_ROWID
FROM TPM_PROJECTVERSION V,
TPM_PROJECT P
WHERE P.PROJECTID = V.PROJECTID
Another key aspect to note is that your materialized view logs must be created as with rowid.
Below is a functional test scenario:
CREATE TABLE foo(foo NUMBER, CONSTRAINT foo_pk PRIMARY KEY(foo));
CREATE MATERIALIZED VIEW LOG ON foo WITH ROWID;
CREATE TABLE bar(foo NUMBER, bar NUMBER, CONSTRAINT bar_pk PRIMARY KEY(foo, bar));
CREATE MATERIALIZED VIEW LOG ON bar WITH ROWID;
CREATE MATERIALIZED VIEW foo_bar
NOLOGGING
CACHE
BUILD IMMEDIATE
REFRESH FAST ON COMMIT AS SELECT foo.foo,
bar.bar,
foo.ROWID AS foo_rowid,
bar.ROWID AS bar_rowid
FROM foo, bar
WHERE foo.foo = bar.foo;

Have you tried it without the ANSI join ?
CREATE MATERIALIZED VIEW MV_Test
NOLOGGING
CACHE
BUILD IMMEDIATE
REFRESH FAST ON COMMIT
AS
SELECT V.*, P.* FROM TPM_PROJECTVERSION V,TPM_PROJECT P
WHERE P.PROJECTID = V.PROJECTID

You will get the error on REFRESH_FAST, if you do not create materialized view logs for the master table(s) the query is referring to. If anyone is not familiar with materialized views or using it for the first time, the better way is to use oracle sqldeveloper and graphically put in the options, and the errors also provide much better sense.

The key checks for FAST REFRESH includes the following:
1) An Oracle materialized view log must be present for each base table.
2) The RowIDs of all the base tables must appear in the SELECT list of the MVIEW query definition.
3) If there are outer joins, unique constraints must be placed on the join columns of the inner table.
No 3 is easy to miss and worth highlighting here

USE THIS CODE
CREATE MATERIALIZED VIEW MV_ptbl_Category2
BUILD IMMEDIATE
REFRESH FORCE
ON COMMIT
AS
SELECT *
FROM ptbl_Category2;
Note- MV_ptbl_Category2 is the Materialized view name
Ptbl is the table name.

improving views performace in sql server 2012

I have one view vwBalance which returns more than 150.000.000 rows, bellow is the code:
SELECT *
FROM dbo.balance INNER JOIN
dbo.NomBalance ON dbo.balance.IdNomBil = dbo.NomBalance.Id
But I want to transpose the values return so I use a PIVOTE function like this:
SELECT An, cui,caen, col1, col2, ... col 100
FROM (SELECT cui, valoare, cod_campbil,caen,An
FROM vwBilant WITH (NOLOCK)
p PIVOT (MAX(valoare) FOR cod_campbil IN ( col1, col2, ... col100 ) AS pvt
The questions are:
Should i use query hint inside the view vwBalance? This hint could improve or could lock the transpose action?
It's a problem if I use NOLOCK hint instead of the other query hints?
There are better ways to improve transposing many columns?
Thanks!

I can give the following adviсes:
you can use hint readpast if it does not broke your business logic
you can create clustered index for this view. It materialize you view, but performance of changing operation will be decreased for all tables that are used in this view.
also, you should check indexes for fields that you use in join and where clauses.
and you can use preprocessing. So, you insert this values in some
another table (for example at night). In this case you can use
columnstore index or just make page compression for this table
as well as you can use page compression for all tables that are
used in this view.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Snowflake won't use materialized view if I join tables - snowflake-cloud-data-platform

Related

Why isn't snowflake using my materialized view

Snowflake - Dynamic Filter Condition inside CTE (Common Table Expression)

Use result of stored procedure to join to a table

Creating a fast-refreshing materialized view in Oracle [duplicate]

improving views performace in sql server 2012

Categories

Resources