Azure Data Factory - Copy Offce data to SQL Server - sql-server

Goal is to extract data from Offce 365 via Azure Data Factory into a SQL Server.
It seems like that it is only possible to sink the data from an Office source into the following technologies:
Azure Copy Object:
Sinks für Offce Source:
How is that possible?

Currently, you can only copy data from Office 365 to Azure Blob Storage, Azure Data Lake Storage Gen1, and Azure Data Lake Storage Gen2 in JSON format.
You can copy data to blob storage using a copy activity first and then connect it to another copy activity to copy from storage to the SQL server.

Based on my understanding you need to select dataset as binary format when configuring source and sink.
So the sink presents with the Binary format sources:
So 1st step would be to copy the data into blob as a file and then copy from file into SQL

Related

Azure Blob Container CSV to Azure Table Storage using Synapse

I have a large 700 GB CSV file in an Azure Blob Container.
I am using Azure Synapse to transform column names, and some data and sink it in a table.
I am unable to sink it to the Table Storage in another Azure Data Lake Storage account.
Why can't I choose an Azure Table Storage ? Do I have to add an intermittent Parquet layer to store data between file to table ? Please assist.
Why can't I choose an Azure Table Storage? Do I have to add an intermittent Parquet layer to store data between file to table?
Azure table storage is not supported in sink. Please find the below supported sink types as given in the documentation accordingly: -
https://learn.microsoft.com/en-us/azure/data-factory/data-flow-sink
You can use copy data tool to transform data from azure blob storage to Table storage.
Kindly refer the below documentation link given for copying and transforming the data in Azure Blob storage by using Azure Data Factory or Azure Synapse Analytics
https://learn.microsoft.com/en-us/azure/data-factory/connector-azure-blob-storage?tabs=data-factory

Read Parquet file in filesystem to on-prem SQL Server Table

There is a large amount of documentation available on getting data from an parquet file from Azure blob storage to an on-prem MSSQL instance.
E.g. https://www.c-sharpcorner.com/article/external-tables-in-sql-server-2016-or-2017-and-work-with-polybase/ creating an external data source.
-- Creating External Data Source
CREATE EXTERNAL DATA SOURCE PolybaseDS
WITH (
TYPE = HADOOP,
LOCATION = 'wasbs://parquetcontainer#polybaseexternaldata.blob.core.windows.net', -- Please change with your container and storage account
CREDENTIAL = AzureStorageCredential)
Is there an eqivalent method or way of using external tables to pull data from a parquet file in the local filesystem rather than blob storage?
SSIS/Powershell solutions would also be great, but SSIS doesn't have a built-in parquet connector so that would probably be a slightly-contrived C# script task. As a test I managed to import the data using Power Query, but it's not great for automation.

Get Last Modified Azure Data Lake into table column SQL With SSIS

I want to ask something about SSIS, that have an SSIS solution that has TASK Source Azure Data Lake, and Destination Azure SQL Database
In Azure SQL Database, I have added column DATEMODIFIED from Last Modified Azure Data Lake ...
The question is "can I get the last modified data CSV Azure Data Lake into Azure SQL DB table."?
Thanks in advance
You will need a script task in your package and use the System.IO.FileInfo object in the script task to get the file properties including the file's last modified date.
You could also use REST API call or SDK as suggested in this SO thread.

How to Import data into Azure SQL database from Data Lake Storage Gen 1?

I created an empty SQL database using Azure portal.
I also added some sample data to a data lake in Data Lake Storage Gen 1.
I downloaded SSMS, linked it to the server containing the SQL database, and added a new table using SSMS in order to have a target location to import the data into the SQL database.
Questions: 1. Will the new table I added in SSMS be recognized in Azure?; 2. How do I get the sample data from the data lake I created into the new table I created in the Azure SQL database?
An article suggested using Azure HDInsight to transfer the data, but it's not part of my free subscription and I don't know how much charges I will incur using it.
Will the new table I added in SSMS be recognized in Azure?
Yes, when we connect to the Azure SQL database with SSMS, all the operations in query editor will register/sync to the Azure SQL database. When the SQL statements executed/ran, we can refresh the SSMS and the new table will exist.
How do I get the sample data from the data lake I created into the new table I created in the Azure SQL database?
What's the sample data did you mean added to Data Lake Storage Gen 1, CSV files? If the file is not very large, I would suggest you download to you local computer can then using SSMS Import and Export Wizard to load the data to Azure SQL database. It may take some time, but it's free.
Or you may follow the tutorial Copy data between Data Lake Storage Gen1 and Azure SQL Database using Sqoop, I didn't find the price of Azure HDInsight when we create it. I'm not sure if it's free. I think you could try it.
Hope this helps.

Copying on premise SQL server database data to Azure in Parquet format

Architectural/perf question here.
I have a on premise SQL server database which has ~200 tables of ~10TB total.
I need to make this data available in Azure in Parquet format for Data Science analysis via HDInsight Spark.
What is the optimal way to copy/convert this data to Azure (Blob storage or Data Lake) in Parquet format?
Due to manageability aspect of task (since ~200 tables) my best shot was - extract data locally to file share via sqlcmd, compress it as csv.bz2 and use data factory to copy file share (with 'PreserveHierarchy') to Azure. Finally run pyspark to load data and then save it as .parquet.
Given table schema, I can auto-generate SQL data extract and python scripts
from SQL database via T-SQL.
Are there faster and/or more manageable ways to accomplish this?
ADF matches your requirement perfectly with one-time and schedule based data move.
Try copy wizard of ADF. With it, you can directly move on-prem SQL to blob/ADLS in Parquet format with just couple clicks.
Copy Activity Overview

Resources