SQL MDS Synchronization - sql-server

We have following question about the SQL Master Data Service:
We have already integrated different client data to MDS with the help MS Excel plugin, now we want to push back updated or new added record to source database. is it possible do using MDS?
Do we have any background sync process to which automatically sync data to and from subscriber and MDS?

Create subscription views in MDS. Then you can leverage SSIS to pull the data from MDS using the subscription view and merging it into the source database.
Used in conjunction with Business Rules in MDS, the ETL can be coded to only query for "validated" members in MDS. This allows for a nice separation of concerns so the ETL doesn't have to be overly complex with data it needs to retrieve from MDS>

Related

Long running view in ssas-tabular

I have a SQL Server database where we have created some views based on dim and fact tables. I need to build SSAS tabular model based on my tables and views. But one of the view runs for 1.5 hour inside SQL query (SSMS). Now I need to use this same view to build my SSAS tabular model but 1.5 hour is not acceptable. This view is made up of more than 10 table joins and lot of Where conditions.
1) Can I bring all these tables being used in this view inside my SSAS tabular model but then I am not sure how to join them all and use where clauses inside SSSAS and build something similar to my view. Is that possible? If yes how?
or
2) I will build one time SSAS model from that view and then if I want to incrementally load the data daily, whats is the best way to do that?
The best option is to set up a proper ETL process. That is:
Extract the tables from your source SQL database into a new SQL database that you control.
Transform the data into a star schema.
Load the data from the star schema into SSAS.
On SQL Server, the most common approach is use SSIS packages for data extraction, movement, and orchestration, and SQL Server Agent Jobs for scheduling.
To answer your questions:
Yes, it is certainly possible to bring in all of the tables directly from your source system into your tabular model, but please don't do this! You will only create problems for yourself later on when creating DAX calculations. More information here.
Incrementally loading data is something you decide for each table that is imported into your tabular model. Again, this is much easier if you have a proper star schema, as you would typically run a full processing on all your dimension tables, and then do incremental processing only on the largest fact tables.

AWS Glue: SQL Server multiple partitioned databases ETL into Redshift

Our team is trying to create an ETL into Redshift to be our data warehouse for some reporting. We are using Microsoft SQL Server and have partitioned out our database into 40+ datasources. We are looking for a way to be able to pipe the data from all of these identical data sources into 1 Redshift DB.
Looking at AWS Glue it doesn't seem possible to achieve this. Since they open up the job script to be edited by developers, I was wondering if anyone else has had experience with looping through multiple databases and transfering the same table into a single data warehouse. We are trying to prevent ourselves from having to create a job for each database... Unless we can programmatically loop through and create multiple jobs for each database.
We've taken a look at DMS as well, which is helpful for getting the schema and current data over to redshift, but it doesn't seem like it would work for the multiple partitioned datasource issue as well.
This sounds like an excellent use-case for Matillion ETL for Redshift.
(Full disclosure: I am the product manager for Matillion ETL for Redshift)
Matillion is an ELT tool - it will Extract data from your (numerous) SQL server databases and Load them, via an efficient Redshift COPY, into some staging tables (which can be stored inside Redshift in the usual way, or can be held on S3 and accessed from Redshift via Spectrum). From there you can add Transformation jobs to clean/filter/join (and much more!) into nice queryable star-schemas for your reporting users.
If the table schemas on your 40+ databases are very similar (your question doesn't clarify how you are breaking your data down into those servers - horizontal or vertical) you can parameterise the connection details in your jobs and use iteration to run them over each source database, either serially or with a level of parallelism.
Pushing down transformations to Redshift works nicely because all of those transformation queries can utilize the power of a massively parallel, scalable compute architecture. Workload Management configuration can be used to ensure ETL and User queries can happen concurrently.
Also, you may have other sources of data you want to mash-up inside your Redshift cluster, and Matillion supports many more - see https://www.matillion.com/etl-for-redshift/integrations/.
You can use AWS DMS for this.
Steps:
set up and configure DMS instance
set up target endpoint for redshift
set up source endpoints for each sql server instance see
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.SQLServer.html
set up a task for each sql server source, you can specify the tables
to copy/synchronise and you can use a transformation to specify
which schema name(s) on redshift you want to write to.
You will then have all of the data in identical schemas on redshift.
If you want to query all those together, you can do that by wither running some transformation code inside redsshift to combine and make new tables. Or you may be able to use views.

How to perform Lookups in Azure Data Factory?

I'm a SSIS Developer. I do lots of SQL stored procedure lookup concepts in SSIS. But when coming to Azure Data Factory I haven't any idea how to perform a lookup using a SQL stored procedure.
Could anyone please guide me on this?
Thanks in advance !
Jay
Azure Data Factory (ADF) is more of an ELT tool rather than ETL, therefore direct lookups are not supported. Instead, this type of operation, along with other transforms is pushed down into the compute you are actually using. For example, if you are moving data to SQL Server, Azure SQL Database or Azure SQL Data Warehouse, you would ensure all data is on the same server and use a Stored Procedure task to execute the lookups using T-SQL and joins. If you are using Azure Data Lake Analytics (ADLA) you would use the U-SQL Activity to run U-SQL or execute ADLA stored procedures, again doing lookups via joins or custom U-SQL code such as Combiner, Applier, Reducer. In fact you can use any of the ADF compute options like SQL, HDInsight (including Hive, Pig, Map Reduce, Streaming and Spark script), Machiine Learning or custom .net activities.
So you need to think about things differently with ADF. Have a look through this article to gain greater understanding of transforming data in ADF:
Transform data in Azure Data Factory
https://learn.microsoft.com/en-us/azure/data-factory/data-factory-data-transformation-activities
As an aside, I would rarely use Lookups in SSIS as performance in early versions used to be poor. Although this has been improved in later versions, generally if you can do it in SQL you probably should. This pattern harnesses the power of SQL Server, rather than dragging data up into the SSIS pipeline, eg for the purposes of lookups (which are essentially joins) and pushing the data back out again. I reserve Data Flow transformations mainly when non-relational data is involved, eg xml or joining your email server with relational data. This is my personal view anyway : )

Best Options to manage large sets of data SQlserver

I am currently working on a project which involves the following:
The application I am working on is connected to a SQlserver
database.
SAP loads information into multiple tables (in a daily
and also hourly basis) into a MASTER database
There are 5 other databases(hosted on the same server) that access this information via synonyms and stored procedure calls to the MASTER database
The MASTER database purely used for storing the data and routing it to the other databases)
Master Database -
Tables:
MASTER_TABLE1 <------- SAP inserts data into this table.Triggers are used to process the valid data & insert into secondary staging tables -say MASTER_TABLE1_SEC
MASTER_TABLE1_SEC -- Holds processed data coming into MASTER_TABLE1
FIVE other databases ( for each manufacturing facility) are present in the same server. My application is connected to the facility databases ( not the Master)
FACILITY1
Facility2
....
FACILITY5
Synonyms of MASTER_TABLE1_SEC are created in each of these 5 facility databases
Stored procedures are again called from the Facility databases- in order to load data from the MASTER_TABLE1_SEC into the respective tables( within EACH facility) based on the business logic.
Is there a better architecture to handle this kind of a project? I am a beginner when it comes to advanced data management. Can anyone suggest a better architecture or tools to handle this?
There are a lot of patterns that would actually meet the needs described here. It serves that you are working with a type of Data Warehouse. I use Data Vault for my Enterprise Data Warehouses. It is an Ensemble Modeling technique designed for integration and master data preparation. You can think of it as a way to house all data from all time. You would then generate Data Marts (Kimball Method) for each of the Facilities containing only thei or whatever is required for their needs.

Syncing Data with Azure Data Factory

Is it possible to set up a pipeline in Azure Data Factory that performs a MERGE between the source and the destination rather than an INSERT? I have been able to successfully select data from my source on-prem table and insert into my destination, but I'd really like to set up a pipeline that will continually update my destination with any changes in the source. E.g copying new records that are added to the source, or updating any data which changes on an existing record.
I've seen references to the Data Sync framework, but from what I can tell that is only supported in the legacy portal. My V12 databases do not even show up in the class Azure portal.
There is the Stored Proc activity which could handle this. You could use Data Factory to land the data in a staging table then call the stored proc to perform the MERGE. Otherwise Data Factory logic is not that sophisticated so you could not perform a merge in the same way you could in SSIS for example. Custom activities are probably not suitable for this, IMHO. This is also in line with Data Factory being ELT rather than ETL.

Resources