I am working on a project which migrates databases from Oracle 10g to SQL Server 2008 using SSMA(SQL SERVER MIGRATION ASSISTANT). I want to know if there is a way to actually compare the data in tables that resides on a table space say 'A' on ORACLE with the corresponding migrated database 'A' on SQL SERVER.
I am not bothered about the data types of various columns right now.If there is a way to map it then it will be great. I am just concerned with the data difference if any that exists.
Let me know if you are aware of any such free tool which does so, or any of you have written a tool which can help me out to do the same.
Thanks !!
You will have to map the PK from the source to the destination and if the colu,ns are the same, fetch a bulk and compare...
Lots of hard work.
Maybe it will be better if you could count rows and verify a statistic group of records.
Related
I have an Oracle database and a SQL Server database. There is one table say Inventory which contains millions of rows in both database tables and it keeps growing.
I want to compare the Oracle table data with the SQL Server data to find out which records are missing in the SQL Server table on daily basis.
Which is best approach for this?
Create SSIS package.
Create Windows service.
I want to consume less resource to achieve this functionality which takes less time and less resource.
Eg : 18 millions records in oracle and 16/17 millions in SQL Server
This situation of two different database arise because two different application online and offline
EDIT : How about connecting SQL server from oracle through Oracle Gateway to SQL server to
1) Direct query to SQL server from Oracle to update missing record in SQL server for 1st time.
2) Create a trigger on Oracle which gets executed when record is deleted from Oracle and it insert deleted record in new oracle table.
3) Create SSIS package to map newly created oracle table with SQL server to update SQL server record.This way only few records have to process daily through SSIS.
What do you think of this approach ?
I would create an SSIS package and load the data from the Oracle table use a Data Flow / OLE DB Data Source. If you have SQL Enterprise, the Attunity Connectors are a bit faster.
Then I would load key from the SQL Server table into a Lookup transformation, where I would match the 2 sources on the key, and direct unmatched rows into a separate output.
Finally I would direct the unmatched rows output to a OLE DB Command, to update the SQL Server table.
This SSIS package will require a lot of memory, but as the matching is done in memory with minimal IO, it will probably outperform other solutions for speed. It will need enough free memory to cache all the keys from the SQL Server Table.
SSIS also has the advantage that it has lots of other transformation functions available if you need them later.
What you basically want to do is replication from Oracle to SQL Server.
You could do this in SSIS, A windows Service or indeed a multitude of platforms.
The real trick is using the correct design pattern.
There are two general design patterns
Snapshot Replication
You take all records from both systems and compare them somewhere (so far we have suggestions to compare in SSIS or compare on Oracle but not yet a suggestion to compare on SQL Server, although this is valid)
You are comparing 18 million records here so this is a lot of work
Differential replication
You record the changes in the publisher (i.e. Oracle) since the last replication then you apply those changes to the subscriber (i.e. SQL Server)
You can do this manually by implementing triggers and log tables on the Oracle side, then use a regular ETL process (SSIS, command line tools, text files, whatever), probably scheduled in SQL Agent to apply these to the SQL Server.
Or you could do this by using the out of the box replication capability to set up Oracle as a publisher and SQL as a subscriber: https://msdn.microsoft.com/en-us/library/ms151149(v=sql.105).aspx
You're going to have to try a few of these and see what works for you.
Given this objective:
I want to consume less resource to achieve this functionality which takes less time and less resource
transactional replication is far more efficient but complicated. For maintenance purposes, which platforms (.Net, SSIS, Python etc.) are you most comfortable with?
Other alternatives:
If you can use Oracle gateway for SQL Server then you do not need to transfer data and can make the query directly.
If you can't use Oracle gateway, you can use Pentaho data integration or another ETL tool to compare tables and get results. Is easy to use.
I think the best approach is using oracle gateway.Just follow the steps. I have similar type of experience.
Install and Configure Oracle Database Gateway for SQL Server.
https://docs.oracle.com/cd/B28359_01/gateways.111/b31042/installsql.htm
Now you can create a dblink from oracle to sql server.
Create a procedure which compare the missing records in oracle database and insert into sql server database.
For example, you can use this statement inside your procedure.
INSERT INTO "dbo"."sql_server_table"#dblink_name("column1","column2"...."column5")
VALUES
(
select column1,column2....column5 from oracle_table
minus
select "column1","column2"...."column5" from "dbo"."sql_server_table"#dblink_name
)
Create a scheduler which execute the procedure daily.
When both databases are online, missing records will be inserted to sql server. Otherwise the scheduler fail or you can execute the procedure manually.
It takes minimum resource.
I will suggest having a homemade ETL solution.
Schedule an oracle job to export source table data (on a daily
manner based on the application logic ) to plain CSV format.
Schedule a SQL-Server job (with acceptable delay from first oracle job) to read this CSV file and import it
to a medium table inside sql-servter using BULK INSERT.
Last part of the SQL-Server job will be reading medium table data
and do the logic(insert, update target table). I suggest having another table to store reports of this daily job result.
We are migrating the Datawarehouse Database from Oracle to DB2 . So, now onwards Our ETL tool generates and loads the data into DB2. Here,We want to make sure the data is loaded properly into DB2 after the ETL Jobs got migrated to DB2 from Oracle. In short, How to verify the data between the two tables(one from Oracle and one from DB2) loaded by using the same job is same?
In my experience, it was always a manual process.
You write SQL to count records, verify other objects in the database.
I would assume you can export tables to a file and compare the files.
Oracle has some validation tools/products (I am not familiar with).
The problem is in testing strategy. You should build before migration.
Some case i used when migrate DWH: Count record, sum (money_field), sum(revenue), order get top
I'm copying about 200 tables from Oracle to SQL Server using SSIS. Right now, the basic package template follows this logic:
Get time
Truncate table
Load data and get row count
Record table name, row count, and time to log table.
Currently, I copy and paste the package and change the data flow. Is there a better way to do this? I know SSIS is metadata driven, but doing 200 tables like this is a little ridiculous. And if my boss wants me to change something in the template then I get to do it all over again. Is there a way to loop through tables? I would just use linked servers in SQL Server but since we have SQL Server Enterprise I'm able to use the Attunity connectors and they are much faster.
Any help would be appreciated. It seems like there must be a better way but I'm not familiar enough with SSIS to really know what to ask for.
I'm faced with needing access for reporting to some data that lives in Oracle and other data that lives in a SQL Server 2000 database. For various reasons these live on different sides of a firewall. Now we're looking at doing an export/import from sql server to oracle and I'd like some advice on the best way to go about it... The procedure will need to be fully automated and run nightly, so that excludes using the SQL developer tools. I also can't make a live link between databases from our (oracle) side as the firewall is in the way. The data needs to be transformed in the process from a star schema to a de-normalised table ready for reporting.
What I'm thinking about is writing a monster query for SQL Server (which I mostly have already) that will denormalise and read out the data from SQL Server into a flat file using the sql server equivalent of sqlplus as a scheduled task, dump into a Well Known Location, then on the oracle side have a cron job that copies down the file and loads it with sql loader and rebuilds indexes etc.
This is all doable, but very manual. Is there one or a combination of FOSS or standard oracle/SQL Server tools that could automate this for me? the Irreducible complexity is the query on one side and building indexes on the other, but I would love to not have to write the CSV dumping detail or the SQL loader script, just say dump this view out to CSV on one side, and on the other truncate and insert into this table from CSV and not worry about mapping column names and all other arcane sqlldr voodoo...
best practices? thoughts? comments?
edit: I have about 50+ columns all of varying types and lengths in my dataset, which is why I'd prefer to not have to write out how to generate and map each single column...
"The data needs to be transformed in the process from a star schema to a de-normalised table ready for reporting."
You are really looking for an ETL tool. If you have no money in the till, I suggest you check out the Open Source Talend and Pentaho offerings.
I am tasked with exporting the data contained inside a MaxDB database to SQL Server 200x. I was wondering if anyone has gone through this before and what your process was.
Here is my idea but its not automated.
1) Export data from MaxDB for each table as a CSV.
2) Clean the CSV to remove ? (which it uses for nulls) and fix the date strings.
3) Use SSIS to import the data into tables in SQL Server.
I was wondering if anyone has tried linking MaxDB to SQL Server or what other suggestions or ideas you have for automating this.
Thanks.
AboutDev.
I managed to find a solution to this. There is an open source MaxDB library that will allow you to connect to it through .Net much like the SQL provider. You can use that to get schema information and data, then write a little code to generate scripts to run in SQL Server to create tables and insert the data.
MaxDb Data Provider for ADO.NET
If this is a one time thing, you don't have to have it all automated.
I'd pull the CSVs into SQL Server tables, and keep them forever, will help with any questions a year from now. You can prefix them all the same, "Conversion_" or whatever. There are no constraints or FKs on these tables. You might consider using varchar for every column (or the ones that cause problems, or not at all if the data is clean), just to be sure there are no data type conversion issues.
pull the data from these conversion tables into the proper final tables. I'd use a single conversion stored procedure to do everything (but I like tsql). If the data isn't that large millions and millions of rows or less, just loop through and build out all the tables, printing log info as necessary, or inserting into exception/bad data tables as necessary.