I have been using Oracle DB for a while now and really liked the "Bulk Collect" feature. It made processes faster. Now, I have started using Snowflake. Is there anything similar to "Bulk Processing" using collections as we have in Oracle DB?
(I am not talking about loading data from external tables/stages using Bulk Copy, but fetching a lot of data from large tables and processing them in stored procedures.)
Any pointers will be appreciated. Thanks in advance.
Related
I have a database in SQL Developer which pull data from an ERP tool and I would like to create a Data warehouse in order to connect it then to PowerBI.
It's my first time that I am doing all this process from the beginning so I am not so experienced.
So where are you suggesting to create the Data Warehouse (I was thinking on SSMS) and how can I connect it with PowerBI ?
My Data Warehouse will consist from some View of my tables and some Joins to get some data in the structure that I want since it is not possible to change anything in the DB.
Thanks in advance.
A "data warehouse" is just a database. The distinction is really more about the commonly used schema design, in the sense that a warehouse is often built along the lines of a star or snowflake design.
So if you already have a database that is extracting data from your ERP, there is nothing to stop you from pointing PowerBI directly at that and performing some analytics etc. If your intention is to start with this database, and then clone/extract/load this data into a new database which is a star/snowflake schema, then that's a much bigger exercises.
I am doing DB synchronization via MSSQL Replication utility which is performing well to me.
Now my objective: after transfer data to destination DB's table, remove transferred data from the source DB's table.
Any help would be highly appreciated.
If this is not feasible what are the other approaches to accomplish this?
I need to update a table in snowflake by taking data from oracle database.
Is there a way to connect to oracle database from snowflake?
If answer is NO how can i update the table in snowflake using data from oracle.
Not sure exactly what you are looking for here. The best way to get data into Snowflake is via the COPY INTO command, which would then allow you to update the Snowflake table with that data. If you are looking at ways to keep the 2 systems in-sync, then you might want to look into the various data replication tools that are in the marketplace. If this is a transactional update, then you can use a connector (ODBC, JDBC, Python, etc.) to update the data from one system to another. I wouldn't recommend that for bulk updates, though.
There are several ways you can integrate your data from oracle to snowflake. If you are familiar with ETL tool you can use any one of them or you can use any program language to extract and load.
good Day.
I need help. I want to transfer the data in Snowflake from Staging tables to Fact tables automatically, when data is available in Stage table. While moving data from Staging table to Fact tables, I have couple of Custom validations on each column and row.
Any idea how to do this in Snowflake.
If any one knows could you please suggest me...!
Thanks in Advance...!
There are many ways to do this and how you go about it depends on what tools you have available. The simplest way to do this without using tools outside of the Snowflake ecosystem would be:
On each of the staging tables you have, set up a stream on these tables (here is the Snowflake documentation on streams)
Create a task that runs on a schedule (here is the Snowflake doc on tasks) to pull from the streams and write into the fact table.
This is really a general data warehousing question rather than a Snowflake one. Here is some more documentation on building SCD type 2 dimensions also written by someone at Snowflake
Assuming "staging tables" refers to a Snowflake table and not a file in a Snowflake stage, I would recommend using a Stream and Task for this. A stream will identify the delta of data that needs to be loaded, and a Task can execute on a schedule and will only actually run something if there is data in the stream. Create a stored procedure that is executed in the Task to run your validations and Merge the outcome of those into your Fact.
I'm a SSIS Developer. I do lots of SQL stored procedure lookup concepts in SSIS. But when coming to Azure Data Factory I haven't any idea how to perform a lookup using a SQL stored procedure.
Could anyone please guide me on this?
Thanks in advance !
Jay
Azure Data Factory (ADF) is more of an ELT tool rather than ETL, therefore direct lookups are not supported. Instead, this type of operation, along with other transforms is pushed down into the compute you are actually using. For example, if you are moving data to SQL Server, Azure SQL Database or Azure SQL Data Warehouse, you would ensure all data is on the same server and use a Stored Procedure task to execute the lookups using T-SQL and joins. If you are using Azure Data Lake Analytics (ADLA) you would use the U-SQL Activity to run U-SQL or execute ADLA stored procedures, again doing lookups via joins or custom U-SQL code such as Combiner, Applier, Reducer. In fact you can use any of the ADF compute options like SQL, HDInsight (including Hive, Pig, Map Reduce, Streaming and Spark script), Machiine Learning or custom .net activities.
So you need to think about things differently with ADF. Have a look through this article to gain greater understanding of transforming data in ADF:
Transform data in Azure Data Factory
https://learn.microsoft.com/en-us/azure/data-factory/data-factory-data-transformation-activities
As an aside, I would rarely use Lookups in SSIS as performance in early versions used to be poor. Although this has been improved in later versions, generally if you can do it in SQL you probably should. This pattern harnesses the power of SQL Server, rather than dragging data up into the SSIS pipeline, eg for the purposes of lookups (which are essentially joins) and pushing the data back out again. I reserve Data Flow transformations mainly when non-relational data is involved, eg xml or joining your email server with relational data. This is my personal view anyway : )