How to stream data from Snowflake SQL output to S3 on AWS

How to stream data from Snowflake SQL output to S3 on AWS - snowflake-cloud-data-platform

I have a SQL to run on three Snowflake tables hosted on AWS account. I would like to stream any new records based on the output of my SQL to an S3 bucket using possibly Kafka or any other streaming service. What are my options to implement this?

You can unload data directly into an S3 bucket :
Create storage integration.
Create stage or specify bucket url directly in the query.
copy into s3://mybucket/unload/ from mytable storage_integration = s3_int;
Ref : https://docs.snowflake.com/en/user-guide/data-unload-considerations.html

Related

Azure Synapse Data Flows - parquet file names not working

I have created a data flow within Azure synapse to:
take data from a dedicated SQL pool
perform some transformations
send the resulting output to parquet files
I am then creating a View based on the resulting parquet file using OPENROWSET to allow PowerBI to use the data via the built-in serverless SQL pool
My issue is that whatever the file name I enter on the integration record, the parquet files always look like part-00000-2a6168ba-6442-46d2-99e4-1f92bdbd7d86-c000.snappy.parquet - or similar
Is there a way to have a fixed filename which is updated each time the pipeline is run, or alternatively is there a way to update the parquet file to which the View refers each time the pipeline is run, in an automated way.
Fairly new to this kind of integration, so if there is a better way to acheive this whole thing then please let me know

Azure Synapse Data Flows - parquet file names not working
I repro'd the same and got the file name as in below image.
In order to have the fixed name for sink file name,
Set Sink settings as follows
File name Option: Output to single file
Output to single file: tgtfile (give the file name)
In optimize, Select single partition.
Filename is as per the settings

Local CSV file load to snowflake table

How to scheulde a task to load csv file to internal stage daily without using any scheduler...source is local file path and target is snowflake table

Have you explored Snowpipe with auto_ingest?
You set up a notification service, on AWS this is a combination of SQS and SNS that calls Snowpipe to ingest new files.
https://docs.snowflake.com/en/user-guide/data-load-snowpipe-auto-s3.html
There would be something similar for Azure

Copy data from Azure SQL Server to Salesforce using Azure Data Factory

I am new to Azure Data Factory and trying to solve a business use case where our data is stored in Azure Blob Storage which needs to be copied into Salesforce using Azure Data Factory. I have done a lot of research on this but all the examples I got are for copying data from Salesforce to SQL Server using Azure Data Factory.
So far I have been able to get the access token from Salesforce by using Web Activity in Azure Data Factory and then connecting that Web Activity to another Web Activity that would copy Bulk Data into Salesforce using SF Bulk API. But for some reason when the activity tries to read the CSV file stored in Azure Blob Storage it is giving me the following error:
Below is my pipeline which fails when the second web activity runs:
Below is the screenshot of my csv file:
It would be really helpful if anyone has any idea why can't azure pipeline read my csv file and how can I resolve this error.
Thanks for your help.

Azure SQL blob storage select

I am attempting to create a temp table to store the values of an xlsx file on my azure blob storage, I have followed numerous Microsoft articles now and I am under the impression that I should be using SELECT * FROM OPENROWSET(), this seems to be working or at least selecting something.
Here is my code:
SELECT * INTO ##TempTest FROM OPENROWSET(BULK 'test.xlsx',
DATA_SOURCE = 'DevStoreAccount', SINGLE_CLOB) AS a;
SELECT * FROM ##TempTest
This all runs fine, but the output is not what I am expecting, surely this should return all my columns / rows from the excel file? Or am I mistaken?
The above code returns the following:
What exactly is it returning and should I be doing something different? Any help would really be appreciated.
I'm trying this route as the columns in the excel file could change at any time, so I need to dynamically create my tables.

I'd recommend checking this thread, although the post is old, it is still relevant to your question.
The approach taken for the similar scenario:
1- Create and update Excel file using Open XML SDK
2- Upload Excel Template in Azure BLOB
3- Download Excel template in azure web role local storage
4- Read and update excel file from azure web role local storage
5- Upload updated excel in Azure BLOB.
II
You could also use another similar concept as mentioned here
Downloading excel file as Stream from BLOB
Creating Excel document using Open XML SDK
After edit saving doc to Stream
Uploading back the Stream to BLOB

How to export data to local system from snowflake cloud data warehouse?

I am using snowflake cloud datawarehouse, which is like teradata that hosts data. I am able run queries and get results on the web UI itself. But I am unclear how can one export the results to a local PC so that we can report based on the data.
Thanks in advance

You have 2 options which both use sfsql which is based on henplus. The first option is to export the result of your query to a S3 staging file as shown below:
CREATE STAGE my_stage URL='s3://loading/files/' CREDENTIALS=(AWS_KEY_ID=‘****' AWS_SECRET_KEY=‘****’);
COPY INTO #my_stage/dump
FROM (select * from orderstiny limit 5) file_format=(format_name=‘csv' compression=‘gzip'');
The other option is to capture the sql result into a file.
test.sql:
set-property column-delimiter ",";
set-property sql-result-showheader off;
set-property sql-result-showfooter off;
select current_date() from dual;
$ ./sfsql < test.sql > result.txt
For more details and help, login to your snowflake account and access the online documentation or post your question to Snowflake support via the Snowflake support portal which is accessible through the Snowflake help section. Help -> Support Portal.
Hope this helps.

You can use a COPY command to export a table (or query results) into a file on S3 (using "stage" locations), and then a GET command to save it onto your local filesystem. You can only do it from the "sfsql" Snowflake command line tool (not from web UI).
Search the documentation for "unloading", you'll find more info there.

You can directly download the data from Snowflakes to Local Filesystem without staging to S3 or redirecting via unix pipe
Use COPY INTO to load table data to table staging
https://docs.snowflake.net/manuals/sql-reference/sql/copy-into-location.html
snowsql$> copy into #%test_table/result/data_ from test_table
file_format = (TYPE ='[FILE_TYPE]' compression='[COMPRESSION_TYPE]');
Use GET command to download data from table staging to Local FS
https://docs.snowflake.net/manuals/sql-reference/sql/get.html
snowsql$> get #%test_table/result/data_ file:///tmp/;

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to stream data from Snowflake SQL output to S3 on AWS - snowflake-cloud-data-platform

I have a SQL to run on three Snowflake tables hosted on AWS account. I would like to stream any new records based on the output of my SQL to an S3 bucket using possibly Kafka or any other streaming service. What are my options to implement this?

You can unload data directly into an S3 bucket : Create storage integration. Create stage or specify bucket url directly in the query. copy into s3://mybucket/unload/ from mytable storage_integration = s3_int; Ref : https://docs.snowflake.com/en/user-guide/data-unload-considerations.html

Related

Azure Synapse Data Flows - parquet file names not working

Local CSV file load to snowflake table

Copy data from Azure SQL Server to Salesforce using Azure Data Factory

Azure SQL blob storage select

How to export data to local system from snowflake cloud data warehouse?

Categories

Resources