I am executing more than 40 reports on a daily basis using SSRS. These reports are using similar data-set with some changes on condition.
I think it is better to use an indexed view rather than tables; this will enhance the performance and reduce the time for each report.
But I do not know how to update/refresh indexed view after creation. The underlying table is updated every mid-night.
Related
I'm just onboarding dbt and having gone through the tutorial docs I'm wondering if there's a difference between materializing my transformations as views or tables? I'm using Snowflake as the data warehouse. There's some documentation here that shows the differences between a table and a materialized view but if I'm using dbt to update the tables regularly, do they more or less become the same thing?
Thanks!
dbt doesn't support materialized views, as far as I'm aware, but as Felipe commented, there is an open issue to discuss it. If it were possible to use materialized views on Snowflake, you're right that they somewhat become the same thing. The materialized view would update even if you haven't run dbt. As Drew mentions in the ticket though, there are a lot of caveats that make using tables with dbt preferable in most use cases: "no window functions, no unions, limited aggregates, can't query views, etc etc etc".
That said, dbt does support views and tables.
Even when you're using dbt, there's still a difference between a view and a table. A table will always need to be refreshed by dbt in order to be updated. A view will always be as up-to-date as the underlying tables it is referencing.
For example, let's say you have a dbt model called fct_orders which references a table that is loaded by Fivetran/Stitch called shopify.order. If your model is materialized as a view, it will always return the most up-to-date data in the Shopify table. If it is materialized as a table, and new data has arrived in the Shopify table since you last run dbt, the model will be 'stale'.
That said, the benefit of materializing it as a table is that it will run more quickly, given it's not having to do the SQL 'transformation' each time.
The advice I have seen given most often is something like this:
If using a view isn't too slow for your end-users, use a view.
If a view gets too slow for your end-users, use a table.
If building a table with dbt gets too slow, use incremental models in dbt.
If you use DBT there's little need for materialized views: a materialized view is in fact a table which is based on a query - same as "create table as select". If you have a DBT model you can materialize as a table and you'll get the same result. Now the difference between a table and a materialized view is the fact that the materialized view automatically updates, while the table does not. But if you're using DBT you can schedule a refresh of the table by scheduling DBT.
This will only give you updated data after your scheduled DBT will complete, which is not the same as a materialized view if the underlying table changes frequently, but most people refrain from using materialized views on top of tables that change frequently because the running cost can get out of control.
Materialized views in Snowflake can only query one table, while with DBT there are more options - e.g. join two tables and materialize as a table will give you something you can't do with a materialized view.
Finally, if you really want to deploy materialized views with DBT there are two ways:
Use the pre-hook or the post-hook, which executes any piece of SQL after running the DBT model. That can work but the maintenance is not great.
There is a way to create your own materialization - see https://docs.getdbt.com/docs/guides/creating-new-materializations - this is not an easy task, but that will give you want you want. There's also a GitHub page called dbt-hack which gives interesting techniques on non-standard materializations.
I have a SQL Server database where we have created some views based on dim and fact tables. I need to build SSAS tabular model based on my tables and views. But one of the view runs for 1.5 hour inside SQL query (SSMS). Now I need to use this same view to build my SSAS tabular model but 1.5 hour is not acceptable. This view is made up of more than 10 table joins and lot of Where conditions.
1) Can I bring all these tables being used in this view inside my SSAS tabular model but then I am not sure how to join them all and use where clauses inside SSSAS and build something similar to my view. Is that possible? If yes how?
or
2) I will build one time SSAS model from that view and then if I want to incrementally load the data daily, whats is the best way to do that?
The best option is to set up a proper ETL process. That is:
Extract the tables from your source SQL database into a new SQL database that you control.
Transform the data into a star schema.
Load the data from the star schema into SSAS.
On SQL Server, the most common approach is use SSIS packages for data extraction, movement, and orchestration, and SQL Server Agent Jobs for scheduling.
To answer your questions:
Yes, it is certainly possible to bring in all of the tables directly from your source system into your tabular model, but please don't do this! You will only create problems for yourself later on when creating DAX calculations. More information here.
Incrementally loading data is something you decide for each table that is imported into your tabular model. Again, this is much easier if you have a proper star schema, as you would typically run a full processing on all your dimension tables, and then do incremental processing only on the largest fact tables.
Can we create Materialized views in Impala?
If not, what is the alternative solution for better performance of view.
Impala can't create materialized views at this time. So the solution for better view performance would be to load the output of the view query into a table and then have the view query the table or just query the table. As for keeping the data up to date you can take a batch approach of scheduling some DML statement to refresh the data or you could take a streaming approach by using something like Kafka to keep the data up to date.
We have a SQL 2005/2008 database that has a table with a computed column. We're using the computed column as a discriminator in NHibernate so having it in the database is proving to be very useful.
In order to gain the benefits of faster integration tests, I'd like to be able to run our integration tests against an in-memory database such as SQLite or SQL CE. But I don't think either of those support the computed column.
Are there any other solutions to my problem? I have complete access to the database and can modify it if there's a better solution available. I've seen this post that suggests using a view instead of a computed column, is this the best alternative?
What I did was added the computed column to the DataTable when loading the table from SqlCe. I stored the definition of the computed DataColumn in a "configuration" table stored in the database. I was able to do complex calculations that depended on a "chain" of tables, where each table performed a simplier function of a more complex function. (The last table in the chain contained the results.) I used SqlCe because one table of five contained 15 million rows. Too much data for the in-memory data sets of ADO.NET. (I had a requirement of using local client based calculations before posting to server.)
I have an SSIS project where one of the steps involves populating a SQL Server table from an Oracle Table.
The Oracle table has a column ssis_control_flag. I want to pull across all records that have this field set to 'T'.
Now, I was wondering which would be the best way of doing this, and the two options as I have detailed in the question presented themselves.
So really, I am wondering which would be faster/better. Should I create a conditional split in the SSIS package that filters off all the records I want? Or should I create a view in Oracle that selects the records based on the criteria, and utilise that view as the data source in SSIS?
Or is there an even better way of doing this? You help would be much appreciated!
Thanks
Why don't you use a WHERE clause to filter the records, instead of creating a view? May be I am not getting your question correctly.
Then in general, bringing all the data to SSIS and then filtering out is not recommended. Especially when you can do the filtering at the source DB end itself. Consider the network bandwidth costs as well.
Then this particular filter that you are talking about here, cannot be done with a better efficiency in SSIS than that can be done at DB. Hence better do it in the Oracle DB itself.
You can use a query using openrowset as the source for the dataflow instead of directly accessing the Oracle table.