There is a large amount of documentation available on getting data from an parquet file from Azure blob storage to an on-prem MSSQL instance.
E.g. https://www.c-sharpcorner.com/article/external-tables-in-sql-server-2016-or-2017-and-work-with-polybase/ creating an external data source.
-- Creating External Data Source
CREATE EXTERNAL DATA SOURCE PolybaseDS
WITH (
TYPE = HADOOP,
LOCATION = 'wasbs://parquetcontainer#polybaseexternaldata.blob.core.windows.net', -- Please change with your container and storage account
CREDENTIAL = AzureStorageCredential)
Is there an eqivalent method or way of using external tables to pull data from a parquet file in the local filesystem rather than blob storage?
SSIS/Powershell solutions would also be great, but SSIS doesn't have a built-in parquet connector so that would probably be a slightly-contrived C# script task. As a test I managed to import the data using Power Query, but it's not great for automation.
Related
When everytime my azure pipeline runs a new files gets created in azure data lake storage, so now I want my external table already created for this table to point to the latest file created in data lake
I have multiple parquet files of the same table in blob storage, we want to read the latest parquet file in external table in snowflake.
Have you checked out this section in the Snowflake documentation. It covers the steps required to configure Automatic Refresh of External Tables using Azure Event Grid. If this is not suitable for your use case, can you provide more detail on your issue and why.
I have a large 700 GB CSV file in an Azure Blob Container.
I am using Azure Synapse to transform column names, and some data and sink it in a table.
I am unable to sink it to the Table Storage in another Azure Data Lake Storage account.
Why can't I choose an Azure Table Storage ? Do I have to add an intermittent Parquet layer to store data between file to table ? Please assist.
Why can't I choose an Azure Table Storage? Do I have to add an intermittent Parquet layer to store data between file to table?
Azure table storage is not supported in sink. Please find the below supported sink types as given in the documentation accordingly: -
https://learn.microsoft.com/en-us/azure/data-factory/data-flow-sink
You can use copy data tool to transform data from azure blob storage to Table storage.
Kindly refer the below documentation link given for copying and transforming the data in Azure Blob storage by using Azure Data Factory or Azure Synapse Analytics
https://learn.microsoft.com/en-us/azure/data-factory/connector-azure-blob-storage?tabs=data-factory
Goal is to extract data from Offce 365 via Azure Data Factory into a SQL Server.
It seems like that it is only possible to sink the data from an Office source into the following technologies:
Azure Copy Object:
Sinks für Offce Source:
How is that possible?
Currently, you can only copy data from Office 365 to Azure Blob Storage, Azure Data Lake Storage Gen1, and Azure Data Lake Storage Gen2 in JSON format.
You can copy data to blob storage using a copy activity first and then connect it to another copy activity to copy from storage to the SQL server.
Based on my understanding you need to select dataset as binary format when configuring source and sink.
So the sink presents with the Binary format sources:
So 1st step would be to copy the data into blob as a file and then copy from file into SQL
I need to export the data of a table in Snowflake into excel or csv file. Can I directly get the data into my local machine without any staging like S3 bucket or similar?
If your table is too big to query in the UI and download the results to a CSV or Tab-delimited file, then you'll need to leverage COPY INTO {location} as your Snowflake-native solution to get data out. You can leverage Snowflake's internal staging, if you don't have your own S3 bucket and then leverage SnowSQL to "GET" the file from internal stage to your local machine.
For more information, https://docs.snowflake.net/manuals/user-guide/intro-summary-unloading.html
You can export up to 1M rows directly into excel with the Snowflake ODBC connector:
1) install ODBC driver
2) configure driver with snowflake creds
3) open excel and go to data->get data-> other sources - > odbc
If it's small enough that you don't mind the single stream download, you could write a little program to make this happen. Like a little python utility that leverages the snowflake connector to run your query and turn the data into a local csv.
Architectural/perf question here.
I have a on premise SQL server database which has ~200 tables of ~10TB total.
I need to make this data available in Azure in Parquet format for Data Science analysis via HDInsight Spark.
What is the optimal way to copy/convert this data to Azure (Blob storage or Data Lake) in Parquet format?
Due to manageability aspect of task (since ~200 tables) my best shot was - extract data locally to file share via sqlcmd, compress it as csv.bz2 and use data factory to copy file share (with 'PreserveHierarchy') to Azure. Finally run pyspark to load data and then save it as .parquet.
Given table schema, I can auto-generate SQL data extract and python scripts
from SQL database via T-SQL.
Are there faster and/or more manageable ways to accomplish this?
ADF matches your requirement perfectly with one-time and schedule based data move.
Try copy wizard of ADF. With it, you can directly move on-prem SQL to blob/ADLS in Parquet format with just couple clicks.
Copy Activity Overview