SSIS copy multiple tables - sql-server

I have more than 200 MSSQL tables, and want to transfer data to Azure Data Lake Storage.
One approach I consider is to use SSIS with dynamic data flows, i.e create table name variable and do foreach loop over table names and for each table run dataflow. However this approach seems wrong ,though files are created in Data Lake storage with correct schemes data is not transferred due to wrong mappings.
Is there any generic way to create one dynamic data flow and transfer huge number of table's data?

The scenario you are describing can now be achieved in ADF V2 - V2 added a set of rich control flow enhancements including the lookup activity, parameter passing, and foreach looping. You can see a tutorial of how to accomplish this here: https://learn.microsoft.com/en-us/azure/data-factory/tutorial-bulk-copy

Related

Looping Through Tables in a DB in Informatica

I am looking for a way in Informatica to pull data from a table in a database, load it in Snowflake, and then move on to the next table in that same DB and repeating that for the remaining tables in the database.
We currently have this set up running in Matillion where there is an orchestration that grabs all of the names of a table of a database, and then loops through each of the tables in that database to send the data into Snowflake.
My team and I have tried to ask Informatica Global Support, but they have not been very helpful for us to figure out how to accomplish this. They have suggested things like Dynamic Mapping, which I do not think will work for our particular case since we are in essence trying to get data from one database to a Snowflake database and do not need to do any other transformations.
Please let me know if any additional clarification is needed.
Dynamic Mapping Task is your answer. You create one mapping. With, or without any transformations - as you need. Then you set up Dynamic Mapping Task to execute the mapping across whole set of your 60+ different sources and targets.
Please note that this is available as part of Cloud Data Integration module of IICS. It's not available in PowerCenter.

How to move data from S3 to Snowflake

I have a few questions regarding the process of copying tables from S3 to Snowflake.
The plan is to copy some data from AWS/S3 onto snowflake and then perform some modeling by DataRobot
We have some tables that contain PII data and we would like to hide those columns from Datarobot, what suggestion do you have for this problem?
The schema in AWS needs to match the schema in Snowflake for the copying process.
Thanks,
Mali
Assuming you know the schema of the data you are loading, you have a few options for using Snowflake:
Use COPY INTO statements to load the data into the tables
Use SNOWPIPE to auto-load the data into the tables (this would be good for instances where you are regularly loading new data into Snowflake tables)
Use EXTERNAL TABLES to reference the S3 data directly as a table in Snowflake. You'd likely want to use MATERIALIZED VIEWS for this in order for the tables to perform better.
As for hiring the PII data from DataRobot, I would recommend leveraging Snowflake DYNAMIC DATA MASKING to establish rules that obfuscate the data (or null it out) for the role that DataRobot is using.
All of these features are well-documented in Snowflake documentation:
https://docs.snowflake.com/
Regarding hiding your PII elements, you can use 2 different roles, one would be say data_owner(the role that will create the table and load the data in it) and another say data_modelling (for using data robot)
Create masking policies using the data owner such that the data robot cannot see the column data.
About your question on copying the data, there is no requirement that AWS S3 folder need to be in sync with Snowflake. you can create the external stage with any name and point it to any S3 folder.
Snowflake documentation has good example which helps to get some hands on :
https://docs.snowflake.com/en/user-guide/data-load-s3.html

What the Process to transfer the staging table data to Fact tables in Snowflake by Custom Validations

good Day.
I need help. I want to transfer the data in Snowflake from Staging tables to Fact tables automatically, when data is available in Stage table. While moving data from Staging table to Fact tables, I have couple of Custom validations on each column and row.
Any idea how to do this in Snowflake.
If any one knows could you please suggest me...!
Thanks in Advance...!
There are many ways to do this and how you go about it depends on what tools you have available. The simplest way to do this without using tools outside of the Snowflake ecosystem would be:
On each of the staging tables you have, set up a stream on these tables (here is the Snowflake documentation on streams)
Create a task that runs on a schedule (here is the Snowflake doc on tasks) to pull from the streams and write into the fact table.
This is really a general data warehousing question rather than a Snowflake one. Here is some more documentation on building SCD type 2 dimensions also written by someone at Snowflake
Assuming "staging tables" refers to a Snowflake table and not a file in a Snowflake stage, I would recommend using a Stream and Task for this. A stream will identify the delta of data that needs to be loaded, and a Task can execute on a schedule and will only actually run something if there is data in the stream. Create a stored procedure that is executed in the Task to run your validations and Merge the outcome of those into your Fact.

Azure datafactory multiple tables

I have a one business scenario , we have to pull all the tables from the one database let say adventure-work and put all the tables information in separate csv in data lake. suppose in adventure works db if we have 20 tables I need to pull all the table paralleling and each table contains one csv i.e 20 tables will contain 20 csv in azure data lake. How to do using Azure data factory.Kindly don't use for-each activity it takes files sequentially and time consuming.
In Data Factory, there are two ways can help you create 20 csv files from 20 tables in one pipeline: for-each activity and Data Flow.
In Data Flow , add 20 Sources and Sink, for example:
No matter which way, the copy active must be sequentially and take some time.
What you should do is to think about how to improve the copy data performance like Thiago Gustodio said in comment, it can help your same the time.
For example, set more DTUs of your database, using more DIU for your copy active.
Please reference these Data Factory documents:
Mapping data flows performance and tuning guide
Copy activity performance and scalability guide
They all provide performance supports for you.
Hope this helps.

Moving SQL server data into Dynamics 365 CRM

I have more than 500s of tables in SQL server that I want to move to Dynamics 365. I am using SSIS so far. The problem with SSIS is the destination entity of dynamics CRM is to be specified along with mappings and hence it would be foolish to create separate data flows for entities for 100s of SQL server table sources. Is there any better way to accomplish this?
I am new to SSIS. I don't feel this is the correct approach. I am just simulating the import/export wizard of SQL server. Please let me know if there are better ways
It's amazing how often this gets asked!
SSIS cannot have dynamic dataflows because the buffer size (the pipeline) is calculated at design time (as opposed to execution time).
The only way you can re-use a dataflow is if all the source to target mappings are the same - Eg if you have 2 tables with exactly the same DDL structure.
One option (horrible IMO) is to concatenate all columns into a massive pipe-separated VARCHAR and then write this to your destination into a custom staging table with 2 columns eg (table_name, column_dump) & then "unpack" this in your target system via a post-Load SQL statement.
I'd bite the bullet, put on your headphones and start churning out the SSIS dataflows one by one - you'd be surprised how quick you can bang them out!
ETL works that way. You have to map source, destination & column mapping. If you want that to be dynamic that’s possible in Execute SQL task inside foreach loop container. Read more
But when we are using Kingswaysoft CRM destination connector - this is little tricky (may or may not be possible?) as this need very specific column mapping between source & destination.
That too when the source schema is from OLEDB, better to have separate Dataflow tasks for each table.

Resources