Azure Blob Container CSV to Azure Table Storage using Synapse - database

I have a large 700 GB CSV file in an Azure Blob Container.
I am using Azure Synapse to transform column names, and some data and sink it in a table.
I am unable to sink it to the Table Storage in another Azure Data Lake Storage account.
Why can't I choose an Azure Table Storage ? Do I have to add an intermittent Parquet layer to store data between file to table ? Please assist.

Why can't I choose an Azure Table Storage? Do I have to add an intermittent Parquet layer to store data between file to table?
Azure table storage is not supported in sink. Please find the below supported sink types as given in the documentation accordingly: -
https://learn.microsoft.com/en-us/azure/data-factory/data-flow-sink
You can use copy data tool to transform data from azure blob storage to Table storage.
Kindly refer the below documentation link given for copying and transforming the data in Azure Blob storage by using Azure Data Factory or Azure Synapse Analytics
https://learn.microsoft.com/en-us/azure/data-factory/connector-azure-blob-storage?tabs=data-factory

Related

External Table in snowflake should read and point to the latest files created in Azure data lake storage automatically

When everytime my azure pipeline runs a new files gets created in azure data lake storage, so now I want my external table already created for this table to point to the latest file created in data lake
I have multiple parquet files of the same table in blob storage, we want to read the latest parquet file in external table in snowflake.
Have you checked out this section in the Snowflake documentation. It covers the steps required to configure Automatic Refresh of External Tables using Azure Event Grid. If this is not suitable for your use case, can you provide more detail on your issue and why.

Azure Data Factory - Copy Offce data to SQL Server

Goal is to extract data from Offce 365 via Azure Data Factory into a SQL Server.
It seems like that it is only possible to sink the data from an Office source into the following technologies:
Azure Copy Object:
Sinks für Offce Source:
How is that possible?
Currently, you can only copy data from Office 365 to Azure Blob Storage, Azure Data Lake Storage Gen1, and Azure Data Lake Storage Gen2 in JSON format.
You can copy data to blob storage using a copy activity first and then connect it to another copy activity to copy from storage to the SQL server.
Based on my understanding you need to select dataset as binary format when configuring source and sink.
So the sink presents with the Binary format sources:
So 1st step would be to copy the data into blob as a file and then copy from file into SQL

Read Parquet file in filesystem to on-prem SQL Server Table

There is a large amount of documentation available on getting data from an parquet file from Azure blob storage to an on-prem MSSQL instance.
E.g. https://www.c-sharpcorner.com/article/external-tables-in-sql-server-2016-or-2017-and-work-with-polybase/ creating an external data source.
-- Creating External Data Source
CREATE EXTERNAL DATA SOURCE PolybaseDS
WITH (
TYPE = HADOOP,
LOCATION = 'wasbs://parquetcontainer#polybaseexternaldata.blob.core.windows.net', -- Please change with your container and storage account
CREDENTIAL = AzureStorageCredential)
Is there an eqivalent method or way of using external tables to pull data from a parquet file in the local filesystem rather than blob storage?
SSIS/Powershell solutions would also be great, but SSIS doesn't have a built-in parquet connector so that would probably be a slightly-contrived C# script task. As a test I managed to import the data using Power Query, but it's not great for automation.

How to Import data into Azure SQL database from Data Lake Storage Gen 1?

I created an empty SQL database using Azure portal.
I also added some sample data to a data lake in Data Lake Storage Gen 1.
I downloaded SSMS, linked it to the server containing the SQL database, and added a new table using SSMS in order to have a target location to import the data into the SQL database.
Questions: 1. Will the new table I added in SSMS be recognized in Azure?; 2. How do I get the sample data from the data lake I created into the new table I created in the Azure SQL database?
An article suggested using Azure HDInsight to transfer the data, but it's not part of my free subscription and I don't know how much charges I will incur using it.
Will the new table I added in SSMS be recognized in Azure?
Yes, when we connect to the Azure SQL database with SSMS, all the operations in query editor will register/sync to the Azure SQL database. When the SQL statements executed/ran, we can refresh the SSMS and the new table will exist.
How do I get the sample data from the data lake I created into the new table I created in the Azure SQL database?
What's the sample data did you mean added to Data Lake Storage Gen 1, CSV files? If the file is not very large, I would suggest you download to you local computer can then using SSMS Import and Export Wizard to load the data to Azure SQL database. It may take some time, but it's free.
Or you may follow the tutorial Copy data between Data Lake Storage Gen1 and Azure SQL Database using Sqoop, I didn't find the price of Azure HDInsight when we create it. I'm not sure if it's free. I think you could try it.
Hope this helps.

Copy image field data from SQL Server to Azure Blob Storage as blob blocks

I have a table in SQL Server with the following structure:
CREATE TABLE dbo.Documents (
Id INT IDENTITY
,Name VARCHAR(50) NOT NULL
,Data IMAGE NOT NULL
,CONSTRAINT PK_Documents_DocumentId PRIMARY KEY CLUSTERED (DocumentId)
)
Is there any way to extract the data from this table and load it to Azure Blob Storage preserving the following structure:
import-results
document-id
document-name
where
- document-id is actually id of a document (Id field) from Document table
- document-name is name of a document (Name field) from Document table
- the content of the block is Data field from Document table
I tried looking into Data Factory, but it looks like it can load only data of a query to a file, but I couldn't find the way to configure the described procedure.
P.S. this is like a part of migration from on-prem data to cloud and the goal is to avoid storing terabytes of documents in SQL server database by replacing this with blobs in Blob Storage.
If you do just want to avoid storing terabytes of documents in sql server database. I suggest you adopting below steps.
Step 1: Retrieve images data and upload them into Azure Blob Storage as image files.
Step 2: Just use Azure Data Factory to import data from sql database into Azure Table Storage.
Step 3:Get image urls which stored in Azure Blob Storage to store them into Data field.
Hope it helps you.

Resources