(Submitting this on behalf of a Snowflake User, hoping for additional insight or alternative recommendations...)
We are trying to copy the data from azure blob to snowflake tables using COPY INTO statement.
While creating staging area it need SAS Token.
If I want to run this copy into script using same staging area, how can I automate the staging area creation with new SAS token every day.
Is there any way other than SAS token to access blob in Azure
Also help me whether we can create stored procedure to automate all this tasks like
--> Creating staging area with SAS token
--> Creating fileformat
--> Copy into statement.
Please help me
Recommendation #1:
I recommend to take a look into Tasks or review Technology partners section on Snowflake website find ETL/ELT tool that works best for your scenario. Most tools have native intergration with Azure so you don't need to manually generate token, it will be done using keys.
Recommendation #2:
I don't think your challenge here relates to the tool you are using, but rather the expiration of your SAS Token. Are you intentionally expiring your token every 24 hours? If you extend the duration of that, then you won't have to create a new stage with a new SAS Token every day.
If you are trying to rotate SAS Tokens every day, then my suggestion would be to create a script (python or something) that actually requests the new Token from Azure and then creates then recreates your stage with the new token. Stored procedures or Tasks can not retrieve the SAS Token from Azure for you, so I'm not sure how you would fully automate this process with SP or Tasks.
Any other recommendations or alternative work-arounds?
Related
I want to move all data from one Azure SQL Server to different Azure SQL Server which more than 90 days old, and after moving need to delete moved data from first Azure SQL Server.
I want to run these steps on daily basis.
I am new to Azure and able to do same with Azure Data Factory. Can you please suggest any other best suited approach?
You are already using the best approach.
Azure Data Factory is an easy to use when it comes to extract and copy the data between the services. It also provide scheduling the triggers, i.e., triggering the copy pipeline after specific interval of time or any event. Refer Create a trigger that runs a pipeline on a schedule.
If the volume of data is large, you can re-configure the Integration Runtime (IR) resources (Compute type and Core count) to overcome the performance issue, if required. Refer below image.
What are the steps to be taken to migrate historical data load from Teradata to Snowflake?
Imagine there is 200TB+ of historical data combined from all tables.
I am thinking of two approaches. But I don't have enough expertise and experience on how to execute them. So looking for someone to fill in the gaps and throw some suggestions
Approach 1- Using TPT/FEXP scripts
I know that TPT/FEXP scripts can be written to generate files for a table. How can I create a single script that can generate files for all the tables in the database. (Because imagine creating 500 odd scripts for all the tables is impractical).
Once you have this script ready, how is this executed in real-time? Do we create a shell script and schedule it through some Enterprise scheduler like Autosys/Tidal?
Once these files are generated , how do you split them in Linux machine if each file is huge in size (because the recommended size is between 100-250MB for data loading in Snowflake)
How to move these files to Azure Data Lake?
Use COPY INTO / Snowpipe to load into Snowflake Tables.
Approach 2
Using ADF copy activity to extract data from Teradata and create files in ADLS.
Use COPY INTO/ Snowpipe to load into Snowflake Tables.
Which of these two is the best suggested approach ?
In general, what are the challenges faced in each of these approaches.
Using ADF will be a much better solution. This also allows you to design DataLake as part of your solution.
You can design a generic solution that will import all the tables provided in the configuration. For this you can choose the recommended file format (parquet) and the size of these files and parallel loading.
The challenges you will encounter are probably a poorly working ADF connector to Snowflake, here you will find my recommendations on how to bypass the connector problem and how to use DataLake Gen2:
Trouble loading data into Snowflake using Azure Data Factory
More about the recommendation on how to build Azure Data Lake Storage Gen2 structures can be found here: Best practices for using Azure Data Lake Storage Gen2
I have been tasked with creating an application or using an existing one (Access, Excel, Power Apps) that allows users to read Snowflake data and also allow update, insert and delete operations. I am pretty sure Excel, Access and PowerApps are read only. PowerApps would also run 10 bucks a month for an app that currently only needs to be used once a quarter.
I was hoping I could used ODBC, but it looks like that only reads, no writeback. I do have the ability to use a SQL server as a middle man. I thought I would use ADF to mirror the data being modified with truncate and loads to Snowflake. But if I could skip that link in the chain it would be preferable.
Thoughts?
There are a couple of tools that can help you and business users read and write back to Snowflake. Many users then use Streams and Tasks on the table that is updated to automate further processing on Snowflake.
Some examples:
Open-source Excelerator - Excel plug-in to write to Snowflake
Sigma Computing - a cloud-native, serverless Excel / BI tool
I made a Spotify app that analyzes user data and manages interactive features by writing the API responses to a PostgreSQL database. The developer rules state that basically I have to delete the data when the user is not actively using my app.
Is there a way to automate this on the server (I'm using AWS Lightsail/Ubuntu) to do it daily? Would I need to add a datetime column to all of my tables and follow one of these: https://www.the-art-of-web.com/sql/trigger-delete-old/? Or is there a better way?
I'd like to create an environment like the one shown in the pic. Are there any good tutorials out there that will help me create a copy of my production database (Sql Server in Azure) and then every night at a specific time load all the data from production into the copy (so I have data to work with in the copy while developing)?
There are several options that you can use.
Please check this blog post and look under option 2. in particular.
You could set when will this be executed by using Azure Scheduler service that would run your Azure WebJob at the midnight. Azure WebJob would execute "CREATE DATABASE AS COPY OF" command.