Snowflake Loading Flat Files - snowflake-cloud-data-platform

My company is looking to possibly migrate to Snowflake from SQL Server. From what i've read on snowflake documentation, flat files (CSV) can get uploaded and set into a staging table then use COPY INTO that loads data into physical table.
example: put file://c:\temp\employees0*.csv #sf_tuts.public.%emp_basic;
My question is, can this be automated via a job or script within snowflake? this includes the copy into command.

Yes, there are several ways to automate jobs in Snowflake as already commented by others. Putting your code in a Stored Procedure and call it via a Task in schedule is an option.
There is also a command line interface in Snowflake called SnowSQL.

Related

Load all CSVs from path on local drive into AzureSQL DB w/Auto Create Tables

I frequently need to validate CSVs submitted from clients to make sure that the headers and values in the file meet our specifications. Typically I do this by using the Import/Export Wizard and have the wizard create the table based on the CSV (file name becomes table name, and the headers become the column names). Then we run a set of stored procedures that checks the information_schema for said table(s) and matches that up with our specs, etc.
Most of the time, this involves loading multiple files at a time for a client, which becomes very time consuming and laborious very quickly when using the import/export wizard. I tried using an xp_cmshell sql script to load everything from a path at once to have the same result, but xp_cmshell is not supported by AzureSQL DB.
https://learn.microsoft.com/en-us/azure/azure-sql/load-from-csv-with-bcp
The above says that one can load using bcp, but it also requires the table to exist before the import... I need the table structure to mimic the CSV. Any ideas here?
Thanks
If you want to load the data into your target SQL db, then you can use Azure Data Factory[ADF] to upload your CSV files to Azure Blob Storage, and then use Copy Data Activity to load that data in CSV files into Azure SQL db tables - without creating those tables upfront.
ADF supports 'auto create' of sink tables. See this, and this

How do I make an offline backup of a Snowflake database?

I have an application that used to run on Snowflake but was migrated to a SQL Managed Instance. i used SnowHub to export all the DDL and then zipped it up should I ever need to look up some source code for a deprecated report or other process.
But how do dump the entire database to a compressed file. You never know when an audit might come up and I might need to restore the data in the state it was a few months or years ago.
You have to use UNLOADING to load data from your table to an internal/external stage in your preferred format.
https://docs.snowflake.com/en/user-guide/data-unload-overview.html
Unfortunately, you cannot unload the whole database with one statement, but need to run one COPY INTO per table. As an alternative you could write a procedure looping over your metadata about tables and run COPY INTO per loop execution.

How to get Data from a Mysql Database to Snowflake

is there any clever way to get my data from a mysql datatbase into snowflake?
I found two possible ways so far:
Option 1: Put a Snowpipe ontop of the mysql database and the pipeline converts the data automatically.
Option 2: I convert tables manually into csv and store them locally and load them via staging into snowflake.
For me it seems strange to convert every table into a csv first. Can I not just push a sql dump file to snowflake? Can I also schedule some reload task in snowflake, so either option1 or 2 get triggered automatically?
Best
NicBeC24
I found some very good information regarding MySQL-Snowflake-migrations here: https://hevodata.com/blog/mysql-to-snowflake-data-migration-steps/
The main steps from the webpage above are:
Exporting data from MySQL
Taking care about data types
Stage your files into Snowflake (Internal/External stage)
Copy the staged files into the table
If the SQL-dump is just a ".sql-file" in ANSI, yes, of course, you can copy&paste it to your Snowflake worksheet and execute it there.
Regarding scheduling: Yes, in Snowflake there is a functionality called Tasks: https://docs.snowflake.com/en/user-guide/tasks-intro.html You can use them to schedule your COPY INTO-command.

How to use pre-copy script in Azure Data Factory to remove null/special character rows?

I am moving data within folder from Azure Data Lake to a SQL Server using Azure Data Factory (ADF).
The folder contains hundreds of .csv files. However, one inconsistent problem with these csv's is that some (not all) have a final row that contains a special character, which when trying to load to a sql table with datatypes other than NVARCHAR(MAX) will fail. To get around this, I have to first use ADF to load the data into staging tables where all columns are set to NVARCHAR(MAX), then I insert those rows that do not contain a special character into tables that have the appropriate data type.
This is a weekly process, and is over a terabyte of data and it takes forever to move the data so I am looking into ways to import into my final tables rather than having a staging component.
I notice that there is a 'pre-copy script' field that can execute before the load to sql server. I want to add code that will allow me to parse out special characters OR null rows before loading to sql server.
I am unsure of how to approach this since the csv's would not be stored in a table, so SQL code wouldn't work. Any guidance on how I can utilize the pre-copy script to clean my data before loading it into sql server?
The pre-copy script is a script that you run against the database before copying new data in, not to modify the data you are ingesting.
I already answered this on another question, providing a possible solution using an intermediate table: Pre-copy script in data factory or on the fly data processing
Hope this helped!
You could consider stored procedure. https://learn.microsoft.com/en-us/azure/data-factory/connector-azure-sql-database#invoking-stored-procedure-for-sql-sink

Speeding Up ETL DB2 to SQL Server?

I came across this blog post when looking for a quicker way of importing data from a DB2 database to SQL Server 2008.
http://blog.stevienova.com/2009/05/20/etl-method-fastest-way-to-get-data-from-db2-to-microsoft-sql-server/
I'm trying to figure out how to achieve the following:
3) Create a BULK Insert task, and load up the file that the execute process task created. (note you have to create a .FMT file for fixed with import. I create a .NET app to load the FDF file (the transfer description) which will auto create a .FMT file for me, and a SQL Create statement as well – saving time and tedious work)
I've got the data in a TXT file and a separate FDF with the details of the table structure. How do I combine them to create a suitable .FMT file?
I couldn't figure out how to create the suitable .FMT files.
Instead I ended up creating replica tables from the source DB2 system in SQL Server and ensured that that column order was the same as what was coming out from the IBM File Transfer Utility.
Using an Excel sheet to control what File Transfers/Tables should be loaded, allowing me to enable/disable as I please, along with a For Each Loop in SSIS I've got a suitable solution to load multiple tables quickly from our DB2 system.

Resources