I need to update a table in snowflake by taking data from oracle database.
Is there a way to connect to oracle database from snowflake?
If answer is NO how can i update the table in snowflake using data from oracle.
Not sure exactly what you are looking for here. The best way to get data into Snowflake is via the COPY INTO command, which would then allow you to update the Snowflake table with that data. If you are looking at ways to keep the 2 systems in-sync, then you might want to look into the various data replication tools that are in the marketplace. If this is a transactional update, then you can use a connector (ODBC, JDBC, Python, etc.) to update the data from one system to another. I wouldn't recommend that for bulk updates, though.
There are several ways you can integrate your data from oracle to snowflake. If you are familiar with ETL tool you can use any one of them or you can use any program language to extract and load.
Related
I am looking for a way in Informatica to pull data from a table in a database, load it in Snowflake, and then move on to the next table in that same DB and repeating that for the remaining tables in the database.
We currently have this set up running in Matillion where there is an orchestration that grabs all of the names of a table of a database, and then loops through each of the tables in that database to send the data into Snowflake.
My team and I have tried to ask Informatica Global Support, but they have not been very helpful for us to figure out how to accomplish this. They have suggested things like Dynamic Mapping, which I do not think will work for our particular case since we are in essence trying to get data from one database to a Snowflake database and do not need to do any other transformations.
Please let me know if any additional clarification is needed.
Dynamic Mapping Task is your answer. You create one mapping. With, or without any transformations - as you need. Then you set up Dynamic Mapping Task to execute the mapping across whole set of your 60+ different sources and targets.
Please note that this is available as part of Cloud Data Integration module of IICS. It's not available in PowerCenter.
good Day.
I need help. I want to transfer the data in Snowflake from Staging tables to Fact tables automatically, when data is available in Stage table. While moving data from Staging table to Fact tables, I have couple of Custom validations on each column and row.
Any idea how to do this in Snowflake.
If any one knows could you please suggest me...!
Thanks in Advance...!
There are many ways to do this and how you go about it depends on what tools you have available. The simplest way to do this without using tools outside of the Snowflake ecosystem would be:
On each of the staging tables you have, set up a stream on these tables (here is the Snowflake documentation on streams)
Create a task that runs on a schedule (here is the Snowflake doc on tasks) to pull from the streams and write into the fact table.
This is really a general data warehousing question rather than a Snowflake one. Here is some more documentation on building SCD type 2 dimensions also written by someone at Snowflake
Assuming "staging tables" refers to a Snowflake table and not a file in a Snowflake stage, I would recommend using a Stream and Task for this. A stream will identify the delta of data that needs to be loaded, and a Task can execute on a schedule and will only actually run something if there is data in the stream. Create a stored procedure that is executed in the Task to run your validations and Merge the outcome of those into your Fact.
Our team is trying to create an ETL into Redshift to be our data warehouse for some reporting. We are using Microsoft SQL Server and have partitioned out our database into 40+ datasources. We are looking for a way to be able to pipe the data from all of these identical data sources into 1 Redshift DB.
Looking at AWS Glue it doesn't seem possible to achieve this. Since they open up the job script to be edited by developers, I was wondering if anyone else has had experience with looping through multiple databases and transfering the same table into a single data warehouse. We are trying to prevent ourselves from having to create a job for each database... Unless we can programmatically loop through and create multiple jobs for each database.
We've taken a look at DMS as well, which is helpful for getting the schema and current data over to redshift, but it doesn't seem like it would work for the multiple partitioned datasource issue as well.
This sounds like an excellent use-case for Matillion ETL for Redshift.
(Full disclosure: I am the product manager for Matillion ETL for Redshift)
Matillion is an ELT tool - it will Extract data from your (numerous) SQL server databases and Load them, via an efficient Redshift COPY, into some staging tables (which can be stored inside Redshift in the usual way, or can be held on S3 and accessed from Redshift via Spectrum). From there you can add Transformation jobs to clean/filter/join (and much more!) into nice queryable star-schemas for your reporting users.
If the table schemas on your 40+ databases are very similar (your question doesn't clarify how you are breaking your data down into those servers - horizontal or vertical) you can parameterise the connection details in your jobs and use iteration to run them over each source database, either serially or with a level of parallelism.
Pushing down transformations to Redshift works nicely because all of those transformation queries can utilize the power of a massively parallel, scalable compute architecture. Workload Management configuration can be used to ensure ETL and User queries can happen concurrently.
Also, you may have other sources of data you want to mash-up inside your Redshift cluster, and Matillion supports many more - see https://www.matillion.com/etl-for-redshift/integrations/.
You can use AWS DMS for this.
Steps:
set up and configure DMS instance
set up target endpoint for redshift
set up source endpoints for each sql server instance see
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.SQLServer.html
set up a task for each sql server source, you can specify the tables
to copy/synchronise and you can use a transformation to specify
which schema name(s) on redshift you want to write to.
You will then have all of the data in identical schemas on redshift.
If you want to query all those together, you can do that by wither running some transformation code inside redsshift to combine and make new tables. Or you may be able to use views.
I'm a SSIS Developer. I do lots of SQL stored procedure lookup concepts in SSIS. But when coming to Azure Data Factory I haven't any idea how to perform a lookup using a SQL stored procedure.
Could anyone please guide me on this?
Thanks in advance !
Jay
Azure Data Factory (ADF) is more of an ELT tool rather than ETL, therefore direct lookups are not supported. Instead, this type of operation, along with other transforms is pushed down into the compute you are actually using. For example, if you are moving data to SQL Server, Azure SQL Database or Azure SQL Data Warehouse, you would ensure all data is on the same server and use a Stored Procedure task to execute the lookups using T-SQL and joins. If you are using Azure Data Lake Analytics (ADLA) you would use the U-SQL Activity to run U-SQL or execute ADLA stored procedures, again doing lookups via joins or custom U-SQL code such as Combiner, Applier, Reducer. In fact you can use any of the ADF compute options like SQL, HDInsight (including Hive, Pig, Map Reduce, Streaming and Spark script), Machiine Learning or custom .net activities.
So you need to think about things differently with ADF. Have a look through this article to gain greater understanding of transforming data in ADF:
Transform data in Azure Data Factory
https://learn.microsoft.com/en-us/azure/data-factory/data-factory-data-transformation-activities
As an aside, I would rarely use Lookups in SSIS as performance in early versions used to be poor. Although this has been improved in later versions, generally if you can do it in SQL you probably should. This pattern harnesses the power of SQL Server, rather than dragging data up into the SSIS pipeline, eg for the purposes of lookups (which are essentially joins) and pushing the data back out again. I reserve Data Flow transformations mainly when non-relational data is involved, eg xml or joining your email server with relational data. This is my personal view anyway : )
How do I migrate table schemas from one DB to another without damaging the data in the destination DB?
I want to move my data from my deployed development copy to the live database and would like to run some scripts to do it. I need to upgrade the schema for some tables and create others. I figure right now that I'll have to check each of the tables in the destination DB against the deployment one and then copy the new tables but that will be quite tedious. Are there any suggestions on how I can do this?
Check out SQL Compare tool by Redgate
http://www.red-gate.com/products/SQL_Compare/index.htm
You should be able to compare both the databases and then generate scripts based on the differences.
You can use a tool to generate the scripts. Redbrick has one, and Erwin can do deltas as well.
Another one besides the excellent Red-Gate SQL Compare is ApexSQL's SQL Diff.
ApexSQL also has a SQL Data Diff if you also need to compare and synchronize data from various sources.
Highly recommended!
Marc
In addition to the RedGate software mentioned above, Embarcadero Change Manager can do both schema and data instance compares, then generate alter scripts for the schemas and DML scripts for the data to bring two database in sync.