How to get Data from a Mysql Database to Snowflake - snowflake-cloud-data-platform

is there any clever way to get my data from a mysql datatbase into snowflake?
I found two possible ways so far:
Option 1: Put a Snowpipe ontop of the mysql database and the pipeline converts the data automatically.
Option 2: I convert tables manually into csv and store them locally and load them via staging into snowflake.
For me it seems strange to convert every table into a csv first. Can I not just push a sql dump file to snowflake? Can I also schedule some reload task in snowflake, so either option1 or 2 get triggered automatically?
Best
NicBeC24

I found some very good information regarding MySQL-Snowflake-migrations here: https://hevodata.com/blog/mysql-to-snowflake-data-migration-steps/
The main steps from the webpage above are:
Exporting data from MySQL
Taking care about data types
Stage your files into Snowflake (Internal/External stage)
Copy the staged files into the table
If the SQL-dump is just a ".sql-file" in ANSI, yes, of course, you can copy&paste it to your Snowflake worksheet and execute it there.
Regarding scheduling: Yes, in Snowflake there is a functionality called Tasks: https://docs.snowflake.com/en/user-guide/tasks-intro.html You can use them to schedule your COPY INTO-command.

Related

Load all CSVs from path on local drive into AzureSQL DB w/Auto Create Tables

I frequently need to validate CSVs submitted from clients to make sure that the headers and values in the file meet our specifications. Typically I do this by using the Import/Export Wizard and have the wizard create the table based on the CSV (file name becomes table name, and the headers become the column names). Then we run a set of stored procedures that checks the information_schema for said table(s) and matches that up with our specs, etc.
Most of the time, this involves loading multiple files at a time for a client, which becomes very time consuming and laborious very quickly when using the import/export wizard. I tried using an xp_cmshell sql script to load everything from a path at once to have the same result, but xp_cmshell is not supported by AzureSQL DB.
https://learn.microsoft.com/en-us/azure/azure-sql/load-from-csv-with-bcp
The above says that one can load using bcp, but it also requires the table to exist before the import... I need the table structure to mimic the CSV. Any ideas here?
Thanks
If you want to load the data into your target SQL db, then you can use Azure Data Factory[ADF] to upload your CSV files to Azure Blob Storage, and then use Copy Data Activity to load that data in CSV files into Azure SQL db tables - without creating those tables upfront.
ADF supports 'auto create' of sink tables. See this, and this

Snowflake Loading Flat Files

My company is looking to possibly migrate to Snowflake from SQL Server. From what i've read on snowflake documentation, flat files (CSV) can get uploaded and set into a staging table then use COPY INTO that loads data into physical table.
example: put file://c:\temp\employees0*.csv #sf_tuts.public.%emp_basic;
My question is, can this be automated via a job or script within snowflake? this includes the copy into command.
Yes, there are several ways to automate jobs in Snowflake as already commented by others. Putting your code in a Stored Procedure and call it via a Task in schedule is an option.
There is also a command line interface in Snowflake called SnowSQL.

How do I make an offline backup of a Snowflake database?

I have an application that used to run on Snowflake but was migrated to a SQL Managed Instance. i used SnowHub to export all the DDL and then zipped it up should I ever need to look up some source code for a deprecated report or other process.
But how do dump the entire database to a compressed file. You never know when an audit might come up and I might need to restore the data in the state it was a few months or years ago.
You have to use UNLOADING to load data from your table to an internal/external stage in your preferred format.
https://docs.snowflake.com/en/user-guide/data-unload-overview.html
Unfortunately, you cannot unload the whole database with one statement, but need to run one COPY INTO per table. As an alternative you could write a procedure looping over your metadata about tables and run COPY INTO per loop execution.

How to use pre-copy script in Azure Data Factory to remove null/special character rows?

I am moving data within folder from Azure Data Lake to a SQL Server using Azure Data Factory (ADF).
The folder contains hundreds of .csv files. However, one inconsistent problem with these csv's is that some (not all) have a final row that contains a special character, which when trying to load to a sql table with datatypes other than NVARCHAR(MAX) will fail. To get around this, I have to first use ADF to load the data into staging tables where all columns are set to NVARCHAR(MAX), then I insert those rows that do not contain a special character into tables that have the appropriate data type.
This is a weekly process, and is over a terabyte of data and it takes forever to move the data so I am looking into ways to import into my final tables rather than having a staging component.
I notice that there is a 'pre-copy script' field that can execute before the load to sql server. I want to add code that will allow me to parse out special characters OR null rows before loading to sql server.
I am unsure of how to approach this since the csv's would not be stored in a table, so SQL code wouldn't work. Any guidance on how I can utilize the pre-copy script to clean my data before loading it into sql server?
The pre-copy script is a script that you run against the database before copying new data in, not to modify the data you are ingesting.
I already answered this on another question, providing a possible solution using an intermediate table: Pre-copy script in data factory or on the fly data processing
Hope this helped!
You could consider stored procedure. https://learn.microsoft.com/en-us/azure/data-factory/connector-azure-sql-database#invoking-stored-procedure-for-sql-sink

Querying a .txt file on a web server

The National Weather Service's Climate Prediction Center maintains data of recent weather data from about 1400 weather stations across the United States. The data for the previous day can always be found at the following address:
http://www.cpc.ncep.noaa.gov/products/analysis_monitoring/cdus/prcp_temp_tables/dly_glob1.txt
In an ambitious attempt to store weather data for future reference, I want to store this data by row using SQL Server 2012. Five years ago a similar question was asked, and this answer mentioned the BULK INSERT command. I do not have access to this option.
Is there an option which allows for direct import of a web hosted text file which does not use the BULK statement? I do not want to save the file as I plan on automating this process and having it run daily direct to the server.
Update: I have found another option in Ad Hoc Distributed Queries. This option is also unavailable to me based on the nature of the databases in question.
Why do you NOT have access to Bulk Insert? I can't think of a reason that would be disabled on your version of SQL Server.
I can think of a couple ways of doing the work.
#1) Record a macro, using excel, to do everything from the data import, to the parsing of the data sets, and then to saving as a CSV file. I just did it; very easy. Then, use BULK INSERT to get the data from the CSV to SQL Server.
#2) Record a macro, using excel, to do everything from the data import, to the parsing of the data sets. Then use a VBA script to send the data to SQL Server. You will find several ideas from the link below.
http://www.excel-sql-server.com/excel-sql-server-import-export-using-vba.htm#Excel%20Data%20Export%20to%20SQL%20Server%20using%20ADO
#3) You could actually use Python or R to get the data from the web. Both have excellent HTML parsing packages. Then, as mentioned in point #1 above, save the data as a CSV (using Python or R) and BULK INSERT into SQL Server.
R is probably a bit off topic here, but still a viable option. I just did it to test my idea and everything is done in just two lines of code!! How efficient is that!!
X <- read.csv(url("http://www.cpc.ncep.noaa.gov/products/analysis_monitoring/cdus/prcp_temp_tables/dly_glob1.txt"))
write.csv(X, file = "C:\\Users\\rshuell001\\Desktop\\foo.csv")

Resources