Why isn't snowflake using my materialized view - snowflake-cloud-data-platform

Why is that when I query my base table with the following aggregate query snowflake doesn't reference my MV?
create or replace table customer_sample as (
SELECT * FROM
"SNOWFLAKE_SAMPLE_DATA"."TPCDS_SF100TCL"."CUSTOMER");
create or replace materialized view customer_sample_mv
as
select c_customer_sk,
sum(c_current_hdemo_sk) total_sum
from customer_sample
group by 1;
select c_customer_sk,
sum(c_current_hdemo_sk) total_sum
from customer_sample
group by 1;
Query Profile

There are lots of possible reasons e.g.
The MV was still being built when you executed the query
Snowflake determined it was quicker to execute the query without using the MV
The user running the query didn’t have the required privileges on the MV
etc.

In this example Snowflake is doing the right thing by skipping the materialized view.
First surprise: Scanning the materialized view is slower than just re-running the query:
select *
from customer_sample_mv
order by total_sum desc nulls last
limit 100;
-- 4.4s
vs
select *
from (
select c_customer_sk,
sum(c_current_hdemo_sk) total_sum
from customer_sample
group by 1
)
order by total_sum desc nulls last
limit 100;
-- 3.6s
So Snowflake is saving time by not choosing the materialized view.
How is this possible?
Well, turns out there are no repeated customer ids. So pre-grouping them does nothing.
select c_customer_sk, count(*) c
from customer_sample
group by 1
having c>1
order by 2 desc
limit 10;
-- null
From the docs:
Even if a materialized view can replace the base table in a particular query, the optimizer might not use the materialized view. For example, if the base table is clustered by a field, the optimizer might choose to scan the base table (rather than the materialized view) because the optimizer can effectively prune out partitions and provide equivalent performance using the base table.
https://docs.snowflake.com/en/user-guide/views-materialized.html#how-the-query-optimizer-uses-materialized-views

Related

Snowflake won't use materialized view if I join tables

I'm trying to join my fact table to my dim table. A Materialized view has been created on my fact table to help with performance when getting the sum of totals. However, I'm seeing that my MV isn't being used in example #1. The only time it works is if I created an aggregated sub-query based on examples #2
The examples below use data from Snowflake's sample data.
Do I always have to write my query like example #2 to make use of it?
--creating the MV
create or replace materialized view my_db.public.inventory_mv as
(select inv_item_sk,sum(INV_QUANTITY_ON_HAND) as INV_QUANTITY_ON_HAND from "SNOWFLAKE_SAMPLE_DATA"."TPCDS_SF10TCL"."INVENTORY" group by 1)
--Example #1 - My MV does not get used according to the query plan
select
b.I_PRODUCT_NAME
,sum(a.INV_QUANTITY_ON_HAND) INV_QUANTITY_ON_HAND
from "SNOWFLAKE_SAMPLE_DATA"."TPCDS_SF10TCL"."INVENTORY" a
join "SNOWFLAKE_SAMPLE_DATA"."TPCDS_SF10TCL"."ITEM" b on a.inv_item_sk = b.i_item_sk
group by 1
--Example #2 - The query planner indicates MV is used
select
b.I_PRODUCT_NAME
,sum(a.INV_QUANTITY_ON_HAND) INV_QUANTITY_ON_HAND
from (select inv_item_sk,sum(INV_QUANTITY_ON_HAND) as INV_QUANTITY_ON_HAND from "SNOWFLAKE_SAMPLE_DATA"."TPCDS_SF10TCL"."INVENTORY" group by 1) a
join "SNOWFLAKE_SAMPLE_DATA"."TPCDS_SF10TCL"."ITEM" b on a.inv_item_sk = b.i_item_sk
group by 1
Even if a materialized view can replace the base table in a particular query, the optimizer might not use the materialized view. For example, if the base table is clustered by a field, the optimizer might choose to scan the base table (rather than the materialized view) because the optimizer can effectively prune out partitions and provide equivalent performance using the base table.
https://docs.snowflake.com/en/user-guide/views-materialized.html#how-the-query-optimizer-uses-materialized-views
It's not using the MV because the MV and query are grouping by a different column.
--creating the MV
create or replace materialized view my_db.public.inventory_mv as
(select inv_item_sk ... group by 1)
The MV definition is grouping by inv_item_sk.
--Example #1 - My MV does not get used according to the query plan
select
b.I_PRODUCT_NAME ...
group by 1
The query is grouping by I_PRODUCT_NAME.
Since the MV and query are grouping by different columns, the optimizer will not use the MV. In the second example the MV is used in the FROM clause, so it has to be used.

Snowflake - Dynamic Filter Condition inside CTE (Common Table Expression)

I am creating view in Snowflake that has CTE on base table without any filters. I have other CTEs that depend on Parent CTE to fetch further information.
Everything is working fine when I query all records from base table that has 45K rows. But when I query view for one particular ID, explain plan shows Base CTE is picking up 45K rows, joining rest of CTE on 45K rows then finally applying my unique ID filter and returning one row.
I am not getting any difference in performance pulling data for all records or one record. Snowflake is not optimizing base CTE to apply the filter criteria I am looking for.
Any suggestions how can I resolve this issue? I used local variables in filter criteria of base CTE but it is not viable solution.
CREATE OR REPLACE VIEW test_v AS
WITH parent_cte as
(select document_id, time, ...
from audit_table
),
emp_cte as
(select employee_details, ...
from employee_tab,
parent_cte
where parent_cte.document_id = employee_tab.document_id),
dep_cte as
(select dep_details, ....
from dependent_tab,
emp_cte
where ..........)
select *
from dep_cte, emp_cte, base_cte;
Now when I query the view for one document_id, plan is fetching all data and joining then applying filter which is not efficient.
select * from test_v where document_id = '1001';
I can't use these tables in one select with join condition as I am using "LATERAL FLATTEN" which is cross multiplying each base table record so I am going with CTE approach.
Appreciate your ideas.

Index for using IN clause in where condition

My application access data from a table in SQL Server. Consider the table name is PurchaseDetail with some other columns.
The select query has below where clauses.
1. name - name has 10000 values only.
2. createdDateTime
The actual query is
select *
from PurchaseDetail
where name in (~2000 name)
and createdDateTime = 'someDateValue';
The SQL tuning advisor gave some recommendation. I tried with those recommended indexes. The performance increased a bit but not completely.
Is there any wrong in my query? or Is there any possible to change/improve my select query?
Because I didn't use IN in where clause before. My table having more than 100 million records.
Any suggestion please?
In this case using IN for that much data is not good at all.
this best way is to use INNER JOIN instead.
It would be nicer if insert those names into a temp table and INNER JOIN it with your SELECT query.

Does MS SQL Server automatically create temp table if the query contains a lot id's in 'IN CLAUSE'

I have a big query to get multiple rows by id's like
SELECT *
FROM TABLE
WHERE Id in (1001..10000)
This query runs very slow and it ends up with timeout exception.
Temp fix for it is querying with limit, break this query into 10 parts per 1000 id's.
I heard that using temp tables may help in this case but also looks like ms sql server automatically doing it underneath.
What is the best way to handle problems like this?
You could write the query as follows using a temporary table:
CREATE TABLE #ids(Id INT NOT NULL PRIMARY KEY);
INSERT INTO #ids(Id) VALUES (1001),(1002),/*add your individual Ids here*/,(10000);
SELECT
t.*
FROM
[Table] AS t
INNER JOIN #ids AS ids ON
ids.Id=t.Id;
DROP TABLE #ids;
My guess is that it will probably run faster than your original query. Lookup can be done directly using an index (if it exists on the [Table].Id column).
Your original query translates to
SELECT *
FROM [TABLE]
WHERE Id=1000 OR Id=1001 OR /*...*/ OR Id=10000;
This would require evalutation of the expression Id=1000 OR Id=1001 OR /*...*/ OR Id=10000 for every row in [Table] which probably takes longer than with a temporary table. The example with a temporary table takes each Id in #ids and looks for a corresponding Id in [Table] using an index.
This all assumes that there are gaps in the Ids between 1000 and 10000. Otherwise it would be easier to write
SELECT *
FROM [TABLE]
WHERE Id BETWEEN 1001 AND 10000;
This would also require an index on [Table].Id to speed it up.

sql server: cannot replicate same order when creating tables

When I run this code, it gives me different sorting results. When I manually do this in Excel, I always get the same results. Can anyone help? Thanks.
select * into tblVSOELookupSort1 from tblVSOELookup order by
[SKU],[ASP Local],[Sum of Qty]
alter table tblVSOELookupSort1 add RowID int identity(1,1) not null
select * into tblVSOELookupSort2 from tblVSOELookupSort1 order by
[Region Per L/U],[Currency]
drop table tblVSOELookupSort1
drop table tblVSOELookup
exec sp_rename tblVSOELookupSort2, tblVSOELookup
select * from tblVSOELookup
That's normal. SQL databases in general do not guarantee a particular row ordering of results unless you specify one. The order is dependent on the RDBMS implementation, query plan, and other things. If you want a particular row ordering in your query results, you must include an ORDER BY clause in your query. In this case, select * from tblVSOELookup order by ....

Resources