We have a SSIS package which is loading data from Oracle DB table to SQL Server table the data is very huge so whenever the job is running its taking long time to insert the records. Below are the scenarios for our requirement,
There is no last modified column in source table so we cannot able to find which record updated and which one is newly inserted. Hence we are truncating destination table every time and load all the data from source table every time.
We have tried lookup transformation but no luck.
So can anyone please help me for better solution to create a correct package by loading data with less time.
Related
I have a simple ETL job copying data from MS SQL to DB2 using DataStage. I need to update a column in MS SQL, "SenttoDB2" once I have successfully copied the data to DB2.
I figured that I just need to create another stage after DB2 and pass the "key" from the source in the update SQL to update the column. Is this correct or am I missing a step somewhere?
You could add an after stage update sql to the source DB2 stage. The sql will get written when the data is pulled, but it will be rolled back on job failure.
If the timing is exceptionally important then you will need to create a second job that updates the source table after the job completes.
We have around 5000 tables in Oracle and the same 5000 tables exist in SQL server. Each table's columns vary frequently but at any point in time source and destination columns will always be the same. Creating 5000 Data flow tasks is a big pain. Further there's a need to map every time a table definition changes, such as when a column is added or removed.
Tried the SSMA (SQL Server Migration Assistance for Oracle ) but it is very slow for transferring huge amount of data then moved to SSIS
I have followed the below approach in SSIS:
I have created a staging table where it will have a table name, source
query (oracle), Target Query (SQL server) used that table in Execute
SQL task and stored the result set as the full result set
created for each loop container off that execute SQL task result set
and with the object and 3 variables table name, source query and
destination query
In the data flow task source I have chosen OLE DB source for oracle
connection and choose data access mode as an SQL command from a
variable (passed source query from loop mapping variable)
In the data flow task destination I have chosen OLE DB source for SQL
connection and choose data access mode as an SQL command from a
variable (passed Target query from loop mapping variable)
And looping it for all the 5000 tables..it is not working can you please guide us how I need to create it for 5000 tables dynamically from oracle to SQL server using SSIS. any sample code/help would be greatly appreciated. Thanks in advance
Using SSIS, when thinking about dynamic source or destination you have to take into consideration that the only case you can do that is when metadata is well defined at run-time. In your case:
Each table columns vary frequently but at any point of time source destination columns will always same.
You have to think about build packages programatically rather than looping over tables.
Yes, you can use loops in case you can classify tables into groups based on their metadata (columns names, data types ...). Then you can create a package for each group.
If you are familiar with C# you can dynamically import tables without the need of SSIS. You can refer to the following project to learn more about reading from oracle and import to SQL using C#:
Github - SchemaMapper
I will provide some links that you can refer to for more information about creating packages programatically and dynamic columns mapping:
How to manage SSIS script component output columns and its properties programmatically
How to Map Input and Output Columns dynamically in SSIS?
Implementing Foreach Looping Logic in SSIS
I am trying to copy data from views on a trusted SQL Server 2012 to tables on a local instance of SQL Server on a scheduled transfer. What would be the best practice for this situation?
Here are the options I have come up with so far:
Write an executable program in C# or VB to delete existing local table, query the data from remote database and then write results to tables in the local database. The executable would run on a scheduled task.
Use BCP to copy data to a file and then upload into local table.
Use SSIS
Note: The connection between local and remote SQL Server is very slow.
Since the transfers are scheduled, so I suppose you want this data to be up-to-date.
My recommendation would be to use SSIS and schedule it using SQL Agent. If you wrote a C# program, I think the best outcome you will gain is a program imitating SSIS. Moreover, SSIS will be a very easy to amend the workflow anytime.
Either way, to make such program/package up-to-date, you will have to answer an important question: Is the source table updatable or is it like a log (inserts only)?
This question is so important because it will determine how you will fetch the new updates from the source table. For example, if the table represents logs, you will most probably use the Primary Key to detect new records, if not, you might want to seek a column representing update date/time. If you have the authority to alter the source table, you might want to add timestamp column which represent the row version (timestamp differs than datetime)
For building an SSIS package, it will mainly contain the following components:
Execute SQL Task to get the maximum value from source table.
Execute SQL Task to get the last value where it should start from at the destination table. You can get this value either by selecting the maximum value from the destination table or if the table is pretty large you can store that value in another table (configuration table for example).
Data Flow which moves the data from source table starting after the value fetched in step 2 to the value fetched in step 1.
Execute SQL Task for updating the new maximum value back to the configuration table if you chose this technique.
BCP can be used to export the data compress and transfer over network which can be then imported into local instance of SQL.
Also with BCP data exports can be contained with smaller batches of data for easier management of data.
https://msdn.microsoft.com/en-us/library/ms191232.aspx
https://technet.microsoft.com/en-us/library/ms190923(v=sql.105).aspx
I have a table for bio-metric devices which capture the data as soon as the employees punch their fingers and uses SQL Server 2014 Standard Edition.
However, our legacy devices were exporting log files and we used a vb engine to push to our Oracle table and used to generate the attendance details.
I managed to export the data from SQL Server and built the first set of records. I would like to schedule a JOB with SQL Server with a condition that the Oracle table should receive ONLY the rows those are NOT already inserted from the SQL Server table.
I checked the append possibilities, which dumps the entire SQL Server table data when the job is executed thus duplicating the rows within the Oracle target table, forcing me to discard the job and to build a new one that deletes the Oracle table and recreates when the job is executed. I feel this is a kind of overkill...
Any known methods available to append only the rows those are NOT existing in the Oracle target table? Unfortunately the SQL Server tables doesn't have any unique id column for the transaction.
Please suggest
Thanks in advance
I think the best way is to use sal server replication with Oracle database as subscriber.
You can read about this solution on MSDN site:
Oracle Subscribers
Regards
Giova
Since you're talking about attendance data for something like an electronic time card, you could just send the data where the punch time is > the last time stamp synced. You would need to maintain that value some where, and it doesn't take into account retro actively entered records. If there's a record creation date in addition to the punch time you could use the created date. Further if there is a modified date in the record you could look into using the merge statement as Alex Pool suggested so you could get both new records and modifications synced to oracle.
I have a strange problem when extracting rows from our MS CRM. Let me explain the flow:
I have a SSIS package that extracts data from CRM and loads it to another database. From this database the data is transformed and loaded to a data warehouse.
If this SSIS package runs at the same time as CRM is inserting/updating many rows i get a strange result: The data that is extracted (using a normal OLD DB Source) in SSIS contains duplicate rows (the ModifiedOn date is different though).
If I run the exact same query tha extracts the data from CRM manually in SSMS i get no duplicates.
So.... it seems like the connection from SSIS somehow makes reads of rows that are updated - both before and after the update (dirty reads of some kind).
Anybody experienced this before?
Thx a lot
/Nicolaj