Is there anyway to stream data from Snowflake to Oracle other then informatica - database

I am working on requirement where I need to stream data from Snowflake to Oracle for some value added process.
Few method which I got to know is unload file to S3 then load to Oracle and other one is informatica.
But above two approaches require some effort so is there any simple way of streaming data from Snowflake to Oracle.

Snowflake cannot connect directly to Oracle. You'll need some tooling or code "in between" the two.
I came across this data migration tool the other day, and it appears to support both Snowflake and Oracle: https://github.com/markddrake/YADAMU---Yet-Another-DAta-Migration-Utility/releases/tag/v1.0
-Paul-

Related

Historical data migration from Teradata to Snowflake

What are the steps to be taken to migrate historical data load from Teradata to Snowflake?
Imagine there is 200TB+ of historical data combined from all tables.
I am thinking of two approaches. But I don't have enough expertise and experience on how to execute them. So looking for someone to fill in the gaps and throw some suggestions
Approach 1- Using TPT/FEXP scripts
I know that TPT/FEXP scripts can be written to generate files for a table. How can I create a single script that can generate files for all the tables in the database. (Because imagine creating 500 odd scripts for all the tables is impractical).
Once you have this script ready, how is this executed in real-time? Do we create a shell script and schedule it through some Enterprise scheduler like Autosys/Tidal?
Once these files are generated , how do you split them in Linux machine if each file is huge in size (because the recommended size is between 100-250MB for data loading in Snowflake)
How to move these files to Azure Data Lake?
Use COPY INTO / Snowpipe to load into Snowflake Tables.
Approach 2
Using ADF copy activity to extract data from Teradata and create files in ADLS.
Use COPY INTO/ Snowpipe to load into Snowflake Tables.
Which of these two is the best suggested approach ?
In general, what are the challenges faced in each of these approaches.
Using ADF will be a much better solution. This also allows you to design DataLake as part of your solution.
You can design a generic solution that will import all the tables provided in the configuration. For this you can choose the recommended file format (parquet) and the size of these files and parallel loading.
The challenges you will encounter are probably a poorly working ADF connector to Snowflake, here you will find my recommendations on how to bypass the connector problem and how to use DataLake Gen2:
Trouble loading data into Snowflake using Azure Data Factory
More about the recommendation on how to build Azure Data Lake Storage Gen2 structures can be found here: Best practices for using Azure Data Lake Storage Gen2

Redshift with SSIS/SSDT

Has anyone been successful using Amazon Redshift as a source or destination ODBC component in SQL Server Data Tools 2012?
I've installed the PostgreSQL drivers provided by Amazon and have successfully tested a connection in the Windows ODBC driver administrator but keep running into arcane error messages when I choose my saved DSN and try to pull a table listing.
Redshift is based on quite an old version of Postgres (8.0). Postgres has changed quite a bit since then and the Postgres tools have changed with it. When downloading any tools to use with Redshift you will probably need to use previous versions from several years ago.
The table listing problem is particularly annoying but I have yet to find a version of psql that can properly list Redshift tables. As an alternative you can use the INFORMATION_SCHEMA tables to find this kind of info, and in my opinion this is what SSIS/SSDT should be doing by default.
I would not expect SSIS to be able to load data into Redshift reliably, i.e. create a Redshift destination. This is because Redshift does not really support INSERT INTO as a way to load data. If you use INSERT INTO you will only be able to load ~10 rows per second. Redshift can only load data quickly from S3 or DynamoDB using the COPY command.
It's a similar story for all other ETL tools I've tried, notably the open source tools Pentaho PDI (aka Kettle) and Talend Open Studio. This is particularly annoying in Talend's case as they have Redshift components but they actually try to use INSERT INTO for loading. Even Amazon's own ETL tool Data Pipeline does not yet have support for Redshift as 'node'.
I have been successful. Try installing both the 32-bit and 64-bit versions of the PostgreSQL ODBC drivers.
Also, in your Project Properties under 'Configuration Properties' > 'Debugging', set 'Run64BitRuntime' to False.
You can also try specifying the connection string in Connection Manager. For example:
Driver={PostgreSQL ANSI};
server=redshiftdb.d113klxjd4ac.us-west-2.redshift.amazonaws.com;uid=;database=;port=5432

C# How to Synch Data Between Database Instances Over the Internet

We have an application that requires our customers to have a SQL server instance on site. At their request, the application needs to synchronize the data in their database with a copy in our datacenter.
We're using .Net 3.5 SP1. We need to synchronize the data exactly, including IDENTITY columns.
We'd prefer to use something like LINQ to SQL that would let us make some simple select and insert/update calls against mapped entities. However, the IDENTITY columns seem to be a problem with LINQ and similar approaches.
We can do this all with built-up SQL statements and turn IDENTITY INSERT on / off as needed, but I'd prefer a more elegant solution.
Thanks!
** Edit - We DO need to write our own solution, and we do need to use .Net 3.5 SP1 to do it. I won't spend your time explaining all the reasons why, but please limit suggestions to options within the .Net playground.
Microsoft Sync Framework can be your solution. This is framework description from Microsoft:
Microsoft Sync Framework is a data synchronization platform from Microsoft that can be used to synchronize data across multiple data stores. Sync Framework includes a transport-agnostic architecture, into which data store-specific synchronization providers, modelled on the ADO.NET data provider API, can be plugged in.
Sync Framework is a comprehensive data synchronization solution that enables developers to build solutions that support synchronization of any database, on any data protocol over any network topology. msdn.microsoft.com
For your convinience providing link to good tutorial on the subject
If it is just a couple of tables that need to be synchronized and there is not a lot of data in the tables (now and future) you could develop some sort of bulk copy from your servers and bulk insert routine on the customer's server.
Since you said you can't use SQL Server replication services or SSIS, then perhaps a backup/restore procedure could be written. You could take a scheduled backup of your database and make it available to calling applications which could then copy the backup, restore it to another instance on the customers server, then pull all data you need via any number of methods and it would exist locally on the customers servers.
Beyond that, I think you may be asking for a maintenance and synchronization nightmare if you can't base your solution on tools that are made to do this sort of thing.

Transferring data between different DBMS's

I would like to transfer the whole Database i have in Informix to Oracle. We have an an application which works on both Databases, one of our customers is moving from Informix to Oracle, and needs to transfer the whole Database to Oracle (the structure is the same).
We need often to transfer data between oracle/Mssql/Informix sometimes only one table and not the whole Database.
Does anybody know about any good program which does this kind of job?
The Pentaho Data Integration ETL tools are available as open source (also known under the former name "Kettle") for cross-database migration and many other use cases.
From their data sheet:
Common Use Cases
Data warehouse population with built-in support for slowly changing
dimensions, junk dimensions
Export of database(s) to text-file(s) or other databases
Import of data into databases, ranging from text-files to excel
sheets
Data migration between database applications
...
A list of input / output data formats can be found in the accepted answer of this question: Does anybody know the list of Pentaho Data Integration (Kettle) connectors list?
It supports all databases with a JDBC driver, which means most of them.
Check this question of mine, it includes some very good ideas: Searching for (freeware) database migration tool
you could give the Oracle Migration Workbench a try. See http://download.oracle.com/docs/html/B15858_01/toc.htm If you want to read Informix data into Oracle on a regular basis, using the Heterogeneous Services might be a better option. Check for hs4odbc or dg4odbc, depending on the Oracle release you have.
I hope this helps,
Ronald.
I have done this in the past and it is not a trivial task. We ended up writing out each table out to a pipe delimited flat file and reloading each table into Oracle with Oracle SQL Loader. There was a ton of Perl scripts to scrub the source data and shell scripts to automate the process as much as possible and run things in parallel as well.
Gotchas that can come up:
1. Pick a delimiter that is as unique as possible.
2. Try to find data types that match as close as possible to the Informix ones as possible. ie date vs. timestamp
3. Try to get the data as clean as possible prior to dumping out the flat files.
4. HS will most likely be too slow..
This was done years ago. You may want to investigate Golden Gate (now owned by Oracle) software which may help with the process(GG did not exist when I did it)
Another idea is use an ETL tool to read Informix and dump the data into Oracle (Informatica comes to mind)
Good luck :)
sqlldr - Oracle's import utility
Here's what I did to transfer 50TB of data from MySQL to ORacle. Generated csv files from MySql and used sqlldr utility in oracle to export all the data from the files to oracle db. It is the fastest way to import data. I researched on this for a few weeks and done lot of benchmark test cases and sqlldr is hands down best and fastest way to import into oracle.

PostgreSQL -> Oracle replication

I'm looking for a tool to export data from a PostgreSQL DB to an Oracle data warehouse. I'm really looking for a heterogenous DB replication tool, rather than an export->convert->import solution.
Continuent Tungsten Replicator looks like it would do the job, but PostgreSQL support won't be ready for another couple months.
Are there any open-source tools out there that will do this? Or am I stuck with some kind of scheduled pg_dump/SQL*Loader solution?
You can create a database link from Oracle to Postgres (this is called heterogeneous connectivity). This makes it possible to select data from Postgres with a select statement in Oracle. You can use materialized views to schedule and store the results of those selects.
It sounds like SymmetricDS would work for your scenario. SymmetricDS is web-enabled, database independent, data synchronization/replication software. It uses web and database technologies to replicate tables between relational databases in near real time.
Sounds like you want an ETL (extract transform load) tool. There are allot of open source options Enhydra Octopus, and Talend Open Studio are a couple I've come across.
In general ETL tools offer you better flexibility than the straight across replication option.
Some offer scheduling, data quality, and data lineage.
Consider using the Confluent Kafka Connect JDBC sink and source connectors if you'd like to replicate data changes across heterogeneous databases in real time.
The source connector can select the entire database , particular tables, or rows returned by a provided query, and send the data as a Kafka message to your Kafka broker. The source connector can calculate the diffs based on an incrementing id column, a timestamp column, or be run in bulk mode where the entire contents are recopied periodically. The sink can read these messages, optionally check them against an avro or json schema, and populate the source database with the results. It's all free, and several sink and source connectors exist for many relational and non-relational databases.
*One major caveat - Some JDBC Kafka connectors can not capture hard deletes
To get around that limitation, you can use a propietary connector such as Debezium (http://www.debezium.io), see also
Delete events from JDBC Kafka Connect Source.

Resources