Compare millions of records from Oracle to SQL server - sql-server

I have an Oracle database and a SQL Server database. There is one table say Inventory which contains millions of rows in both database tables and it keeps growing.
I want to compare the Oracle table data with the SQL Server data to find out which records are missing in the SQL Server table on daily basis.
Which is best approach for this?
Create SSIS package.
Create Windows service.
I want to consume less resource to achieve this functionality which takes less time and less resource.
Eg : 18 millions records in oracle and 16/17 millions in SQL Server
This situation of two different database arise because two different application online and offline
EDIT : How about connecting SQL server from oracle through Oracle Gateway to SQL server to
1) Direct query to SQL server from Oracle to update missing record in SQL server for 1st time.
2) Create a trigger on Oracle which gets executed when record is deleted from Oracle and it insert deleted record in new oracle table.
3) Create SSIS package to map newly created oracle table with SQL server to update SQL server record.This way only few records have to process daily through SSIS.
What do you think of this approach ?

I would create an SSIS package and load the data from the Oracle table use a Data Flow / OLE DB Data Source. If you have SQL Enterprise, the Attunity Connectors are a bit faster.
Then I would load key from the SQL Server table into a Lookup transformation, where I would match the 2 sources on the key, and direct unmatched rows into a separate output.
Finally I would direct the unmatched rows output to a OLE DB Command, to update the SQL Server table.
This SSIS package will require a lot of memory, but as the matching is done in memory with minimal IO, it will probably outperform other solutions for speed. It will need enough free memory to cache all the keys from the SQL Server Table.
SSIS also has the advantage that it has lots of other transformation functions available if you need them later.

What you basically want to do is replication from Oracle to SQL Server.
You could do this in SSIS, A windows Service or indeed a multitude of platforms.
The real trick is using the correct design pattern.
There are two general design patterns
Snapshot Replication
You take all records from both systems and compare them somewhere (so far we have suggestions to compare in SSIS or compare on Oracle but not yet a suggestion to compare on SQL Server, although this is valid)
You are comparing 18 million records here so this is a lot of work
Differential replication
You record the changes in the publisher (i.e. Oracle) since the last replication then you apply those changes to the subscriber (i.e. SQL Server)
You can do this manually by implementing triggers and log tables on the Oracle side, then use a regular ETL process (SSIS, command line tools, text files, whatever), probably scheduled in SQL Agent to apply these to the SQL Server.
Or you could do this by using the out of the box replication capability to set up Oracle as a publisher and SQL as a subscriber: https://msdn.microsoft.com/en-us/library/ms151149(v=sql.105).aspx
You're going to have to try a few of these and see what works for you.
Given this objective:
I want to consume less resource to achieve this functionality which takes less time and less resource
transactional replication is far more efficient but complicated. For maintenance purposes, which platforms (.Net, SSIS, Python etc.) are you most comfortable with?

Other alternatives:
If you can use Oracle gateway for SQL Server then you do not need to transfer data and can make the query directly.
If you can't use Oracle gateway, you can use Pentaho data integration or another ETL tool to compare tables and get results. Is easy to use.

I think the best approach is using oracle gateway.Just follow the steps. I have similar type of experience.
Install and Configure Oracle Database Gateway for SQL Server.
https://docs.oracle.com/cd/B28359_01/gateways.111/b31042/installsql.htm
Now you can create a dblink from oracle to sql server.
Create a procedure which compare the missing records in oracle database and insert into sql server database.
For example, you can use this statement inside your procedure.
INSERT INTO "dbo"."sql_server_table"#dblink_name("column1","column2"...."column5")
VALUES
(
select column1,column2....column5 from oracle_table
minus
select "column1","column2"...."column5" from "dbo"."sql_server_table"#dblink_name
)
Create a scheduler which execute the procedure daily.
When both databases are online, missing records will be inserted to sql server. Otherwise the scheduler fail or you can execute the procedure manually.
It takes minimum resource.

I will suggest having a homemade ETL solution.
Schedule an oracle job to export source table data (on a daily
manner based on the application logic ) to plain CSV format.
Schedule a SQL-Server job (with acceptable delay from first oracle job) to read this CSV file and import it
to a medium table inside sql-servter using BULK INSERT.
Last part of the SQL-Server job will be reading medium table data
and do the logic(insert, update target table). I suggest having another table to store reports of this daily job result.

Related

Fastest way to copy large amounts of data from Oracle to SQL Server

I need to copy large amounts of data from an Oracle database to a SQL Server database. What is the fastest way to do this?
I am looking at data that takes 60 - 70 gig of storage in Oracle. There are no particular restrictions on the method that I use. I can use the SQL Server Management Studio, or the SQL Serer import/export program, or a .NET app, or the developer interface in Oracle, or third party tools, or ----. I just need to move the data as quickly as possible.
The data is geographically organized. The data for each state comes is updated separately into the Oracle database and can be moved over to SQL Server on its own. So the entire volume of the data will rarely be all moved over at once.
So what suggestions would people have?
The fastest way to insert large amounts of data into SQL Server is with SQL Server bulk insert. Common bulk insert techniques are:
T-SQL BULK INSERT statement
BCP command-line utility
SSIS package OLE DB destination with the fast load option
ODBC bcp API from unmanaged code
OLE DB IRowsetFastLoad from unmanaged code
SqlBulkCopy from a .NET application
T-SQL BULK INSERT and the command-line BCP utility use a flat file source so the implication is that you'll need to first export data to files. The other methods can use Oracle SELECT query results directly without the need for an intermediate file, which should perform better overall as long as source/destination network bandwidth and latency isn't a concern.
With SSIS, one would typically create a data flow task for each table to be copied with a OLE DB source (Oracle) and OLE DB destination (SQL Server). The Oracle source provider can be downloaded separately depending on the SSIS version. The latest is the Microsoft Connector v4.0 for Oracle. The SSMS import wizard can be used to generate an SSIS package for the task, which may be run immediately and/or saved and customized as desired. For example, you could create a package variable for the state to be copied and use that in the source SELECT query and in a target DELETE query prior to refreshing data. That would allow the same package to be reused for any state.
OLE DB IRowSetFastLoad or ODBC bcp calls should perform similarly to SSIS but you might be able to eek out some additional performance gains with a lot of attention to detail. However, using these APIs is not trivial unless you are already familiar with C++ and the APIs.
SqlBulkCopy is fast (generally millions of rows per minute), which is good enough performance for most applications without the additional complexity of unmanaged code. It will be best to use the Oracle managed provider for the source SELECT query rather than ODBC or OLE DB provider in .NET code.
My recommendation is you consider not only performance but also your existing skillset.
I actually used the "Microsoft SQL Server Migration Assistant (SSMA)" from MS once for this and it actually did what it promised to do:
SQL Server Migration Assistant for
Oracle
(documentation)
Microsoft SQL Server Migration Assistant v6.0 for
Oracle
(download)
SQL Server Migration Assistant (SSMA) Team's
Blog
However in my case it was not as fast as I would have expected for a 80 GB Oracle-DB (4 hours or something) and I had to do some manual steps afterwards, but the application was developed in hell anyway (one table had 90+ columns and 100+ indices).

What is the best way to move data between postgresql and SQL Server databases

If we have the same database schema in a database on Postgresql and SQL Server (table, primary keys, indexes and triggers are the same) what would be the best way to move data from one database to another? Currently we have one in-house .NET program that does the following through two ODBC connections:
read a row from source database table 1
construct an insert statement
write a row into destination database table 1
Go to 1 if there are more rows in the table
Move to next table in database and go to 1
Needless to say: this is a very slow process and I would be interested if there was a better/faster solution to this?
If it's a "one off" migration, there's a tool you get with SQL Server which allows you to move data around between databases (I'm not on a Windows machine right now, so can't tell you what it's called - something like import/export tool).
If it's an ongoing synchronisation, you can look at the MS Sync framework, which plays nice with SQL Server and Postgres.
The answer is bulk export and bulk loading. You can go much faster by using the copy command in PostgreSQL https://www.postgresql.org/docs/current/static/sql-copy.html to dump data from the tables in the CSV format and then use the bulk insert in SQLServer Import CSV file into SQL Server. A rule of thumb is to harness parallelism for the process. Check if you can load the data ins CSV in parallel to SQL Server and if you have many tables then you can also have a parallelism on the level of separate tables. By the way, loading or migrating data row by row is one of the slowest ways.

How do you pull data from SQL Server to Oracle?

I'm wanting to take data from a SQL Server table and populate a Oracle table. Right now, my solution is to dump the data into a Excel table, write a macro to create a sql file that I can load into Oracle. The problem with this is I want to automate this process and I'm not sure I can automate this.
Is there an easy way to automate populating a Oracle table with data from a SQL Server table?
Thanks in advance
I suppose it depends on your definition of "easy".
The most robust approach would be to either use heterogeneous connectivity in Oracle to create a database link to the SQL Server database and then pull the data from SQL Server or to create a linked server in SQL Server that connects to Oracle and then push the data from SQL Server to Oracle.
Yes. Take a look at MS SQL's SSIS which stands for SQL Server Integration Services. SSIS allows all sorts of advanced capabilities, including automated with Sql Server Jobs, for moving data between disparate data sources. In your case, connecting to Oracle can be achieved a variety of ways.
There are three ways to automate this:
1) You can do as Paul suggested and created an SSIS package that will do this and it can be scheduled via SQL Agent,
2) If you don't want to deal with SSIS, you can download the free SQL# (SQLsharp) CLR Library from http://www.SQLsharp.com/ and use the DB_BulkCopy Stored Procedure to do this in a T-SQL Stored Proc which can also be scheduled via SQL Agent. [note: I am the author of SQL#]
3) You can also set up a Linked Server from SQL Server to Oracle, but this has the draw-back of being a potential security hole. Of course, you could use an Oracle Login that only has write-access to that single table (or something similar to that).
There are lots and lots of ways to do it. Which you choose depends on your requirements.
Using Excel is fine if it's a one time thing.
If it's a once-in-a-while thing, then you could write a simple .NET app that uses a single DataSet and multiple DataAdapters to do the data dump. C# code example here.
if it's a regular thing, then you could put the above in a Schtasks task, or you could use SSIS. I think SSIS is an extra-cost option.
if the requirement is for "online access", then a linked database is probably appropriate.

Best way to migrate export/import from SQL Server to oracle

I'm faced with needing access for reporting to some data that lives in Oracle and other data that lives in a SQL Server 2000 database. For various reasons these live on different sides of a firewall. Now we're looking at doing an export/import from sql server to oracle and I'd like some advice on the best way to go about it... The procedure will need to be fully automated and run nightly, so that excludes using the SQL developer tools. I also can't make a live link between databases from our (oracle) side as the firewall is in the way. The data needs to be transformed in the process from a star schema to a de-normalised table ready for reporting.
What I'm thinking about is writing a monster query for SQL Server (which I mostly have already) that will denormalise and read out the data from SQL Server into a flat file using the sql server equivalent of sqlplus as a scheduled task, dump into a Well Known Location, then on the oracle side have a cron job that copies down the file and loads it with sql loader and rebuilds indexes etc.
This is all doable, but very manual. Is there one or a combination of FOSS or standard oracle/SQL Server tools that could automate this for me? the Irreducible complexity is the query on one side and building indexes on the other, but I would love to not have to write the CSV dumping detail or the SQL loader script, just say dump this view out to CSV on one side, and on the other truncate and insert into this table from CSV and not worry about mapping column names and all other arcane sqlldr voodoo...
best practices? thoughts? comments?
edit: I have about 50+ columns all of varying types and lengths in my dataset, which is why I'd prefer to not have to write out how to generate and map each single column...
"The data needs to be transformed in the process from a star schema to a de-normalised table ready for reporting."
You are really looking for an ETL tool. If you have no money in the till, I suggest you check out the Open Source Talend and Pentaho offerings.

How can I copy data records between two instances of an SQLServer database

I need to copy some records from our SQLServer 2005 test server to our live server. It's a flat lookup table, so no foreign keys or other referential integrity to worry about.
I could key-in the records again on the live server, but this is tiresome. I could export the test server records and table data in its entirety into an SQL script and run that, but I don't want to overwrite the records present on the live system, only add to them.
How can I select just the records I want and get them transferred or otherwise into the live server? We don't have Sharepoint, which I understand would allow me to copy them directly between the two instances.
If your production SQL server and test SQL server can talk, you could just do in with a SQL insert statement.
first run the following on your test server:
Execute sp_addlinkedserver PRODUCTION_SERVER_NAME
Then just create the insert statement:
INSERT INTO [PRODUCTION_SERVER_NAME].DATABASE_NAME.dbo.TABLE_NAME (Names_of_Columns_to_be_inserted)
SELECT Names_of_Columns_to_be_inserted
FROM TABLE_NAME
I use SQL Server Management Studio and do an Export Task by right-clicking the database and going to Task>Export. I think it works across servers as well as databases but I'm not sure.
An SSIS package would be best suited to do the transfer, it would take literally seconds to setup!
I would just script to sql and run on the other server for quick and dirty transferring. If this is something that you will be doing often and you need to set up a mechanism, SQL Server Integration Services (SSIS) which is similar to the older Data Transformation Services (DTS) are designed for this sort of thing. You develop the solution in a mini-Visual Studio environment and can build very complex solutions for moving and transforming data.

Resources