Replicate SQL Server data to Snowflake - snowflake-cloud-data-platform

I have few tables from multiple SQL Server databases which I want to replicate into SF (for analytics purposes). I'm not ready to purchase any external tools.
Is there a way to accomplish without any tools?

I did this last year and also did not want to spend much money.
I did the following:
Create BCP script outputs to JSON files
Using AWS CLI or if you want to spend $59 dollars use TnT Drive and put the
JSON files created from the BCP export into S3
Add a stage in Snowflake pointing to your S3 bucket
Create External Table definitions in Snowflake to your S3 files
Query the data in Snowflake
For my situation, I found that ODBC and Python were slower than BCP exported files with Snowflake External Tables.
bcp "select (select * for json path, without_array_wrapper) from Product" queryout .products.json -c -S "SQLSERVER" -d ProductCatalog -T

If you don't want to spend any money then you probably have 3 categories of option:
Use Snowflake's COPY INTO functionality. This will require you to export your data into flat files and upload them into a stage before you can run the COPY INTO
Code your own solution using one of the Snowflake supported interfaces:
ODBC, JDBC, Python, .NET and a few others
Use a free ETL tool. For example, Talend has a free version which may support Snowflake as a target
I guess the best solution for you depends on your existing skillset and/or how much time you are prepared to spend learning new technology

Related

Data migration for .SQB files to Snowflake

I need to migrate .SQB files to Snowflake.
I have a data relay where MSSQL Server database files are saved in .SQB format (Redgate) and available via sSTP with full backups every week and hourly backups in between.
Our data warehouse is Snowflake and the rest of our data from other sources. I'm looking for the simplest, most cost effective solution to get my data to Snowflake.
My current ETL process is as follows.
AWS EC2 instance (Windows) that downloads the files, applies
Redgate's SQL Backup Converter
(https://documentation.red-gate.com/sbu7/tools-and-utilities/sql-backup-file-converter)
to convert the files to .BAK. This tool requires a license
Restore MS SQL database on the same AWS EC2 Instance
Migrate MS SQL database to Snowflake via Fivetran
Is there a simpler / better solution? I'd love to eliminate the need for the intermediate EC2 if possible.
The .SQB files come from an external vendor and there is no way to have them change the file format or delivery method.
This isn't a full solution to your problem, but it might help to know that you're okay to use the SQL Backup file converter wherever you need to, free of any licensing restrictions. This is true for all of SQL Backup's desktop and command-line tools. Licensing only gets involved when dealing with the Server Components, but once a .SQB file has been created you're free to use SQBConverter.exe to convert it to a .BAK file wherever you need to.
My advice would be to either install SQL Backup on whichever machine you want to use the tooling on, or just copy all the files from an existing installation. Both should work fine, so pick whichever is easiest for you.
(FYI: I'm a current Redgate software engineer and I used to work on SQL Backup until fairly recently.)
You can
Step 1: Export Data from SQL Server Using SQL Server Management Studio.
Step 2: Upload the CSV File to an Amazon S3 Bucket.
Step 3: Upload Data to Snowflake From S3 using COPY INTO command.
You can use your own AWS S3 bucket for this and then create a External Stage pointing to the S3 bucket or You can upload the files into internal Snowflake Stage.
Copy Into from External Stage -
https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html#loading-files-from-a-named-external-stage
Copy Into from an Internal Stage -
https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html#loading-files-from-an-internal-stage
Creating External Stage-
https://docs.snowflake.com/en/sql-reference/sql/create-stage.html

Historical data migration from Teradata to Snowflake

What are the steps to be taken to migrate historical data load from Teradata to Snowflake?
Imagine there is 200TB+ of historical data combined from all tables.
I am thinking of two approaches. But I don't have enough expertise and experience on how to execute them. So looking for someone to fill in the gaps and throw some suggestions
Approach 1- Using TPT/FEXP scripts
I know that TPT/FEXP scripts can be written to generate files for a table. How can I create a single script that can generate files for all the tables in the database. (Because imagine creating 500 odd scripts for all the tables is impractical).
Once you have this script ready, how is this executed in real-time? Do we create a shell script and schedule it through some Enterprise scheduler like Autosys/Tidal?
Once these files are generated , how do you split them in Linux machine if each file is huge in size (because the recommended size is between 100-250MB for data loading in Snowflake)
How to move these files to Azure Data Lake?
Use COPY INTO / Snowpipe to load into Snowflake Tables.
Approach 2
Using ADF copy activity to extract data from Teradata and create files in ADLS.
Use COPY INTO/ Snowpipe to load into Snowflake Tables.
Which of these two is the best suggested approach ?
In general, what are the challenges faced in each of these approaches.
Using ADF will be a much better solution. This also allows you to design DataLake as part of your solution.
You can design a generic solution that will import all the tables provided in the configuration. For this you can choose the recommended file format (parquet) and the size of these files and parallel loading.
The challenges you will encounter are probably a poorly working ADF connector to Snowflake, here you will find my recommendations on how to bypass the connector problem and how to use DataLake Gen2:
Trouble loading data into Snowflake using Azure Data Factory
More about the recommendation on how to build Azure Data Lake Storage Gen2 structures can be found here: Best practices for using Azure Data Lake Storage Gen2

How to load data from oracle and sql server to HAWQ using Spring XD

Hi I have tables in Oracle and SQL Server. I need to load data from oracle and sql server into Pivotal HAWQ using Spring XD. Couldn't find in documentation.
You need to integrate sqoop jobs with Spring XD. See link below for sqoop jobs with springxd
https://github.com/tzolov/spring-xd-sqoop-job
You can use jdbchdfs job to load the data in HDFS as CSV or any PXF supported format. Then you can map the loaded data to HAWQ tables using PXF External tables support. If you need to load this data to Native HAWQ tables then you can do a SELECT INSERT from there, or have SELECT INSERT configured as another batch job that loads the data from PXF External table to HAWQ native.
Outsourcer is another open source solution that was originally designed to load data from Oracle and SQL Server into Greenplum but was enhanced some time ago to also support HAWQ.
All of the documentation and downloads are on http://www.pivotalguru.com/
And if you are interested in seeing the source code, here it is: https://github.com/pivotalguru/outsourcer

Redshift with SSIS/SSDT

Has anyone been successful using Amazon Redshift as a source or destination ODBC component in SQL Server Data Tools 2012?
I've installed the PostgreSQL drivers provided by Amazon and have successfully tested a connection in the Windows ODBC driver administrator but keep running into arcane error messages when I choose my saved DSN and try to pull a table listing.
Redshift is based on quite an old version of Postgres (8.0). Postgres has changed quite a bit since then and the Postgres tools have changed with it. When downloading any tools to use with Redshift you will probably need to use previous versions from several years ago.
The table listing problem is particularly annoying but I have yet to find a version of psql that can properly list Redshift tables. As an alternative you can use the INFORMATION_SCHEMA tables to find this kind of info, and in my opinion this is what SSIS/SSDT should be doing by default.
I would not expect SSIS to be able to load data into Redshift reliably, i.e. create a Redshift destination. This is because Redshift does not really support INSERT INTO as a way to load data. If you use INSERT INTO you will only be able to load ~10 rows per second. Redshift can only load data quickly from S3 or DynamoDB using the COPY command.
It's a similar story for all other ETL tools I've tried, notably the open source tools Pentaho PDI (aka Kettle) and Talend Open Studio. This is particularly annoying in Talend's case as they have Redshift components but they actually try to use INSERT INTO for loading. Even Amazon's own ETL tool Data Pipeline does not yet have support for Redshift as 'node'.
I have been successful. Try installing both the 32-bit and 64-bit versions of the PostgreSQL ODBC drivers.
Also, in your Project Properties under 'Configuration Properties' > 'Debugging', set 'Run64BitRuntime' to False.
You can also try specifying the connection string in Connection Manager. For example:
Driver={PostgreSQL ANSI};
server=redshiftdb.d113klxjd4ac.us-west-2.redshift.amazonaws.com;uid=;database=;port=5432

export data from dot net nuke

I'm migrating some data from DNN to another platform, and need a way to extract database tables one by one in some useful format like XML, CSV etc.
Is there a way to dump and export the whole database or just a few tables at a time?
cheers
It is just a SQL server database, all the standard SQL server tools will work (e.g. bcp ).
Also many DNN modules explicitly support import/export of their content.

Resources