Materialized View vs Table Using dbt - snowflake-cloud-data-platform

Materialized View vs Table Using dbt - snowflake-cloud-data-platform

I'm just onboarding dbt and having gone through the tutorial docs I'm wondering if there's a difference between materializing my transformations as views or tables? I'm using Snowflake as the data warehouse. There's some documentation here that shows the differences between a table and a materialized view but if I'm using dbt to update the tables regularly, do they more or less become the same thing?
Thanks!

dbt doesn't support materialized views, as far as I'm aware, but as Felipe commented, there is an open issue to discuss it. If it were possible to use materialized views on Snowflake, you're right that they somewhat become the same thing. The materialized view would update even if you haven't run dbt. As Drew mentions in the ticket though, there are a lot of caveats that make using tables with dbt preferable in most use cases: "no window functions, no unions, limited aggregates, can't query views, etc etc etc".
That said, dbt does support views and tables.
Even when you're using dbt, there's still a difference between a view and a table. A table will always need to be refreshed by dbt in order to be updated. A view will always be as up-to-date as the underlying tables it is referencing.
For example, let's say you have a dbt model called fct_orders which references a table that is loaded by Fivetran/Stitch called shopify.order. If your model is materialized as a view, it will always return the most up-to-date data in the Shopify table. If it is materialized as a table, and new data has arrived in the Shopify table since you last run dbt, the model will be 'stale'.
That said, the benefit of materializing it as a table is that it will run more quickly, given it's not having to do the SQL 'transformation' each time.
The advice I have seen given most often is something like this:
If using a view isn't too slow for your end-users, use a view.
If a view gets too slow for your end-users, use a table.
If building a table with dbt gets too slow, use incremental models in dbt.

If you use DBT there's little need for materialized views: a materialized view is in fact a table which is based on a query - same as "create table as select". If you have a DBT model you can materialize as a table and you'll get the same result. Now the difference between a table and a materialized view is the fact that the materialized view automatically updates, while the table does not. But if you're using DBT you can schedule a refresh of the table by scheduling DBT.
This will only give you updated data after your scheduled DBT will complete, which is not the same as a materialized view if the underlying table changes frequently, but most people refrain from using materialized views on top of tables that change frequently because the running cost can get out of control.
Materialized views in Snowflake can only query one table, while with DBT there are more options - e.g. join two tables and materialize as a table will give you something you can't do with a materialized view.
Finally, if you really want to deploy materialized views with DBT there are two ways:
Use the pre-hook or the post-hook, which executes any piece of SQL after running the DBT model. That can work but the maintenance is not great.
There is a way to create your own materialization - see https://docs.getdbt.com/docs/guides/creating-new-materializations - this is not an easy task, but that will give you want you want. There's also a GitHub page called dbt-hack which gives interesting techniques on non-standard materializations.

Related

Mirror table vs materialized view

From this excellent video "Microservices Evolution: How to break your monolithic database by Edson Yanaga" I know that there are different ways to split chunk of data as separate db for microservice:
View
Materialized View
Mirror Table using Trigger
Mirror Table using Transactional Code
Mirror Table using ETL tools
Event Sourcing
Could you please explain me the difference between mirrored table and materialized view?
I'm confused due to both of them are stored on disk...

My understanding is :-
Mirrored tables
Mirrored tables are generally an exact copy of the another, source table. Same structure and the same data. Some database platforms allow triggers to be created on the source table which will perform updates on the source table to the mirror table. If the database platform does not provide this functionality, or if the Use Case dictates, you may perform the update in transactional code instead of a trigger.
Materialized Views
A Materialized View contains the result of a query. With a regular database view, when the underlying table data changes, querying the view reflects those changes. However, with a materialized view the data is current only at the point in time of creation (or refresh) of the Materialized view. In simple terms, a materialized view is a snapshot of data at a point in time.

Impala: how to create a materialized view in impala?

Can we create Materialized views in Impala?
If not, what is the alternative solution for better performance of view.

Impala can't create materialized views at this time. So the solution for better view performance would be to load the output of the view query into a table and then have the view query the table or just query the table. As for keeping the data up to date you can take a batch approach of scheduling some DML statement to refresh the data or you could take a streaming approach by using something like Kafka to keep the data up to date.

SQL Server Indexed Views vs Oracle Materialized View

I know materialized view and I'm using it. I have never used indexed views but I will. What are the differences between them ?

SQL Server’s indexed views are always kept up to date. In SQL Server, if a view’s base tables are modified, then the view’s indexes are also kept up to date in the same atomic transaction.
Oracle provides something similar called a materialized view. If Oracle’s materialized views are created without the **REFRESH FAST ON COMMIT** option, then the materialized view is not modified when its base tables are. So that’s one major difference. While SQL Server’s indexed views are always kept current, Oracle’s materialized views can be static.

HBM2DDL -- Create a database view instead of a Table?

All,
Is there some setting that I can tell hbm2ddl to run a view creation statement instead of create a table when generating the database schema?
I'm creating my database schema using the wonderful hbm2ddl tool, but I have one issue. I need to flatten some of the tables into views to aid searching the database, and hql would be overly complex a solution. I've created Entity objects pointed at these views, in order to fetch search results via hibernate. This all works fine, until hbm2ddl is used. In an empty database schema, hbm2ddl will create the database schema based on the jpa annotations, unfortunately, it will also create my views as tables. Is there some setting that I can tell hbm2ddl to run a view creation statement instead of create a table? In lieu of that, is there a way to tell hbm2ddl to skip table creation for an entity (exclude, or something)?
Thanks!

To my knowledge, and this is unfortunate, Hibernate doesn't support things like creating views instead of tables nor validating a schema containing views. See issues like HHH-1872, HHH-2018 or HHH-1329.

ETL : Tracking changes to data using Materialized View log

I am into designing ETL with source and target database as oracle Standard Edition.
For ETL purpose I need to get the changed data everytime.Client does not want any changes to be made in source objects.
Is it feasible to create Materialized view log on source database using dblink to track Inser/Update/Delete on the identified tables.
Thanks and Regards

I do not believe so -- a materialized view log must be created in the same database as the source object. If the database link were unavailable, your materialized view log would then be incomplete or inaccurate, or worse yet, would be blocking DML against the source table.
I'd recommend instead either:
Accepting the overhead of a FULL vs
FAST refreshable materialized view; or
Implementing Streams-based replication
to have your own copy of the table(s) in question,
against which you then implement materialized view logs.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight