ETL : Tracking changes to data using Materialized View log - database

I am into designing ETL with source and target database as oracle Standard Edition.
For ETL purpose I need to get the changed data everytime.Client does not want any changes to be made in source objects.
Is it feasible to create Materialized view log on source database using dblink to track Inser/Update/Delete on the identified tables.
Thanks and Regards

I do not believe so -- a materialized view log must be created in the same database as the source object. If the database link were unavailable, your materialized view log would then be incomplete or inaccurate, or worse yet, would be blocking DML against the source table.
I'd recommend instead either:
Accepting the overhead of a FULL vs
FAST refreshable materialized view; or
Implementing Streams-based replication
to have your own copy of the table(s) in question,
against which you then implement materialized view logs.

Related

How can I sync a SQL Server view to a Postgres table?

I need to sync data from several tables in a legacy SQL Server db (source) to a single table in a Postgres db (target). The schema of the source db is absurd, so the query to select the data takes a very long time to run. I'm planning to create an indexed view in the source db, and then somehow sync that indexed view to the Postgres table.
Right now, I simply have a scheduled task that drops the Postgres table (target) and then recreates it from scratch by running the complex query in the source db. This was quick to set up, and it ensures that changes in the source db always eventually make it to the target db, but recreating the table every few hours is (understandably) very slow and expensive. I need a way to replicate ongoing changes (only the new/updated data) from the source view to the target table. Is there a (relatively) simple way to do this?
I'm somewhat familiar with CDC, but I understand that CDC cannot be used on a view, so I don't believe that's an option. Adding "updated at" timestamps to the source tables is not an option, so I can't use that approach. I could add a hash column to the source tables, or maybe add a hash column to the view, so that's an option if that would work. Is there an existing tool/service that does what I need?
If you want to view SQL Server DB data in PostgreSQL, then you can also tds_fdw.
https://github.com/tds-fdw/tds_fdw
Also, there are some third-party tools which could help you to achieve your goal, for example, SymmetricDS
http://www.symmetricds.org/about/overview

Long running view in ssas-tabular

I have a SQL Server database where we have created some views based on dim and fact tables. I need to build SSAS tabular model based on my tables and views. But one of the view runs for 1.5 hour inside SQL query (SSMS). Now I need to use this same view to build my SSAS tabular model but 1.5 hour is not acceptable. This view is made up of more than 10 table joins and lot of Where conditions.
1) Can I bring all these tables being used in this view inside my SSAS tabular model but then I am not sure how to join them all and use where clauses inside SSSAS and build something similar to my view. Is that possible? If yes how?
or
2) I will build one time SSAS model from that view and then if I want to incrementally load the data daily, whats is the best way to do that?
The best option is to set up a proper ETL process. That is:
Extract the tables from your source SQL database into a new SQL database that you control.
Transform the data into a star schema.
Load the data from the star schema into SSAS.
On SQL Server, the most common approach is use SSIS packages for data extraction, movement, and orchestration, and SQL Server Agent Jobs for scheduling.
To answer your questions:
Yes, it is certainly possible to bring in all of the tables directly from your source system into your tabular model, but please don't do this! You will only create problems for yourself later on when creating DAX calculations. More information here.
Incrementally loading data is something you decide for each table that is imported into your tabular model. Again, this is much easier if you have a proper star schema, as you would typically run a full processing on all your dimension tables, and then do incremental processing only on the largest fact tables.

Mirror table vs materialized view

From this excellent video "Microservices Evolution: How to break your monolithic database by Edson Yanaga" I know that there are different ways to split chunk of data as separate db for microservice:
View
Materialized View
Mirror Table using Trigger
Mirror Table using Transactional Code
Mirror Table using ETL tools
Event Sourcing
Could you please explain me the difference between mirrored table and materialized view?
I'm confused due to both of them are stored on disk...
My understanding is :-
Mirrored tables
Mirrored tables are generally an exact copy of the another, source table. Same structure and the same data. Some database platforms allow triggers to be created on the source table which will perform updates on the source table to the mirror table. If the database platform does not provide this functionality, or if the Use Case dictates, you may perform the update in transactional code instead of a trigger.
Materialized Views
A Materialized View contains the result of a query. With a regular database view, when the underlying table data changes, querying the view reflects those changes. However, with a materialized view the data is current only at the point in time of creation (or refresh) of the Materialized view. In simple terms, a materialized view is a snapshot of data at a point in time.

Replicated database for storing historical data

Only part of the data in the database is being processed by the application, the rest is necessary for reporting purposes, but it causes poor application performance. I would like to archive historical data without modifying database schema.
Is there a possibility to replicate database, delete old data from primary instance and regularly synchronise new changes into replicated database? That way primary "transactional" database will be lightweight and replicated database will contain full set of both current and historical data for reporting purposes.
Could you recommend some tools or give some tips to achieve that on Oracle?
edit:
I'm wondering if I could use streams and somehow make DML handler to ignore DELETE operations on rows (docs.oracle.com/cd/B28359_01/server.111/b28321/…) so that during data replication historical rows will be preserved despite being deleted from transactional db.
You don't need to create two separate databases. Just create one transactional database where you will save all your transactions and then create views based on these tables to show required data. In this way you just have to maintain only one database.

How to partially migrate a database to a new system over time?

We are in the process of a multi-year project where we're building a new system and a new database to eventually replace the old system and database. The users are using the new and old systems as we're changing them.
The problem we keep running into is when an object in one system is dependent on an object in the other system. We've been using views, but have run into a limitation with one of the technologies (Entity Framework) and are considering other options.
The other option we're looking at right now is replication. My boss isn't excited about the extra maintenance that would cause. So, what other options are there for getting dependent data into the database that needs it?
Update:
The technologies we're using are SQL Server 2008 and Entity Framework. Both databases are within the same sql server instance so linked servers shouldn't be necessary.
The limitation we're facing with Entity Framework is we can't seem to create the relationships between the table-based-entities and the view-based-entities. No relationship can exist in the database between a view and a table, as far as I know, so the edmx diagram can't infer it. And I cannot seem to create the relationship manually without getting errors. It thinks all columns in the view are keys.
If I leave it that way I get an error like this for each column in the view:
Association End key property [...] is
not mapped.
If I try to change the "Entity Key" property to false on the columns that are not the key I get this error:
All the key properties of the
EntitySet [...] must be mapped to all
the key properties [...] of table
viewName.
According to this forum post it sounds like a limitation of the Entity Framework.
Update #2
I should also mention the main limitation of the Entity Framework is that it only supports one database at a time. So we need the old data to appear to be in the new database for the Entity Framework to see it. We only need read access of the old system data in the new system.
You can use linked server queries to leave the data where it is, but connect to it from the other db.
Depending on how up-to-date the data in each db needs to be & if one data source can remain read-only you can:
Use the Database Copy Wizard to create an SSIS package
that you can run periodically as a SQL Agent Task
Use snapshot replication
Create a custom BCP in/out process
to get the data to the other db
Use transactional replication, which
can be near-realtime.
If data needs to be read-write in both database then you can use:
transactional replication with
update subscriptions
merge replication
As you go down the list the amount of work involved in maintaining the solution increases. Using linked server queries will work best if its the right fit for what you're trying to achieve.
EDIT: If they're the same server then as suggested by another user you should be able to access the table with servername.databasename.schema.tablename Looks like it's an entity-framework issues & not a db issue.
I don't know about EntityToSql but I know in LinqToSql you can connect to multiple databases/servers in one .dbml if you prefix the tables with:
ServerName.DatabaseName.SchemaName.TableName
MyServer.MyOldDatabase.dbo.Customers
I have been able to click on a table in the .dbml and copy and paste it into the .dbml of the alternate project prefix the name and set up the relationships and it works... like I said this was in LinqToSql, though have not tried it with EntityToSql. I would give it shot before you go though all the work of replication and such.
If Linq-to-Entities cannot cross DB's then Replication or something that emulates it is the only thing that will work.
For performance purposes you probably want either Merge replication or Transactional with queued (not immediate) updating.
Thanks for the responses. We're going to try adding triggers to the old database tables to insert/update/delete records in the new tables of the new database. This way we can continue to use Entity Framework and also do any data transformations we need.
Once the UI functions move over to the new system for a particular feature, we'll remove the table from the old database and add a view to the old database with the same name that points to the new database table for backwards compatibility.
One thing that I realized needs to happen before we can do this is we have to search all our code and sql for ##Identity and replace it with scope_identity() so the triggers don't mess up the Ids in the old system.

Resources