I want to use merge statement in SSIS. I have one source (Oracle) and one destination (SQL Server). Both the tables and structure are same.
I need to insert, update and delete the data based on some date criteria. My question is should I use Merge Join or Lookup Table as I have more than 40 million records in Oracle.
If need more clarification let me know. I will provide you with more info. I am not good in posting though so forgive me.
Personally i would transfer the oracle table to SQL Server and perform any operations locally. I use this approach almost always (nothing quite to the size of your data) but its also useful when dealing with cloud based databases (latency, etc). Its worth noting that if you don't have a datetime column in your source you can use the ORA_ROWSCN pseudo column which gives you a crude change set to load locally.
I have read lots of tales about Merge join not performing accurate joins - i would expect with data of your size it could be an issue.
Lookup could be an issue also due to the size as it has to cache everything (this would attempt to load all oracle records into SSIS anyway so better to transfer it locally).
Hope this helps :)
Related
During our SQL Server database deployments, we create a temporary table which contains the new desired state of data for a particular table. We then merge the temp table into the target table (we actually use individual insert, update and delete statements, but that's probably not relevant). The inserts/updates/deletes performed are captured and written out to a log.
We would like to be able to report on what changes would be applied by a deployment, without actually applying them. This is currently done by rolling back the transaction at the end of the above process. This doesn't feel particularly great though.
Now what we are thinking of doing is, instead of performing the changes and rolling them back, we will generate a migration script for the table (generate some SQL code that performs the necessary inserts, updates and deletes). If we want to do the actual deployment, this code will be dynamically executed. If not, the code will just be printed to a log.
It shouldn't take long to put together some code which can generate migration scripts for two specified tables, but I first wanted to verify that there isn't already an existing tool which can do this?
Searching on Google, I can find lots of talk about migrating whole databases, but nothing about generating a data migration script to effectively merge one table into another.
So my question is, does anyone know of such a tool?
There are several data compare tools like:
SQL Data Compare from Red Gate
SQL Server Data Tools
dbForge Data Compare from Devart
Is that what you're looking for?
I've two tables:
Table A: 631 476 rows
Table B: 12 90 rows
Eache Table have the Field ID that I want to use it as Key in Merge Object. In the following image is possible to see that the process blocks before the Merge Object. I already test with Merge Join object and results are the same...
Which other possibilities I have in order to make this operation using SSIS 14?
Thanks!
If both sources tables are in the same server, Don't use this way. You should simply write an query in SQL Server side.
Something like this :
SELECT *
FROM [Table A]
INNER JOIN [Table B] ON [Table A].ID = [Table B].ID
ORDER BY ...
As James Serra said : When to use T-SQL or SSIS for ETL
Performance – With T-SQL, everything is processed within the SQL engine. With SSIS, you are bringing all the data over to the SSIS memory space and doing the manipulation there. So if speed is an issue, usually T-SQL is the way to go, especially when dealing with a lot of records. Something like a JOIN statement in T-SQL will go much faster than using lookup tasks in SSIS. Another example is a MERGE statement in T-SQL has much better performance than a SCD task in SSIS for large tasks
Features/capabilities – Some features can only be done in either T-SQL or SSIS. You can shred text in SSIS, but can’t in T-SQL. For example, text files with an inconsistent number of fields per row can only be done in SSIS. So certain tasks may force you into using one or the other
Current skill set – Are the people in your IT department more familiar with SSIS or T-SQL?
Ease of development/maintenance – Of course, whatever one you are most familiar with will be the easiest, but if your skills at both are fairly even, then SSIS is usually easier to use because it is graphical, but sometimes you can develop quicker in T-SQL. For example, having to join a bunch of tables will require a bunch of tasks in SSIS, where in T-SQL it is one statement. So it might be easier to create the tasks to join the tables in SSIS, but it will take longer to build then writing a T-SQL statement
Complexity – SSIS can be more complex because you might need to create many tasks to accomplish your objective, where in T-SQL it might just be one statement, like in the example above for joining tables
Extensibility – SSIS has better extensibility because you can create a script task that uses C# that can do just about anything, especially for non-database related tasks. T-SQL is limited because it is only for database tasks. SSIS also has logging, which T-SQL does not
Likelihood of depracation/breaking changes – Minor issue, but T-SQL is always removing features in each release that will have to be rewritten
Types/architecture of sources and destinations – SSIS is better if you have multiple types of sources. For example, it works really well with Oracle, XML, flat-files, etc. SSIS was designed from the beginning to work well with other sources, where T-SQL is designed for SQL Server and it requires more steps to access other sources, and there are additional limitations when doing so
Local regulations – Are there some company standards you have to adhere to that would limit which tool you can use?
I have had issues doing joins or merges in SSIS. I will instead write the TSQL version and execute SQL task. It always runs much faster for me that way.
I want to use SqlBulkCopy to get data from my .Net app into SQL Server, to improve performance.
But the DBA has made all the really big tables (the ones where SqlBulkCopy would really shine) into partitioned views.
There are no articles on SO about this, and there are questions on the web but none of them are answered.
I'm looking for a workaround to make this work.
Note:
I'm going to edit my question tomorrow with the exact error message and whatever other details I can bring. None of the questions on the internet include the error that SQL Server returns.
Given that SQL Server has no support for partitioned views - partitioned tables are something different - likely the view is read only and you msut write to the underlying correct table. Simple like that.
Possibly also that there is an instead of trigger on the view that is not triggered by bulk copy. That said, it is pretty bad to sql bulk copy to a table (sql builk copy is written by someone who loves non scalable scenarios) so the best practives are to sql bulk copy to a temporary table then insert into the final table (avoiding the bad locking code in sql bulk copy). In this case the trigger fires-
I have an SSIS project where one of the steps involves populating a SQL Server table from an Oracle Table.
The Oracle table has a column ssis_control_flag. I want to pull across all records that have this field set to 'T'.
Now, I was wondering which would be the best way of doing this, and the two options as I have detailed in the question presented themselves.
So really, I am wondering which would be faster/better. Should I create a conditional split in the SSIS package that filters off all the records I want? Or should I create a view in Oracle that selects the records based on the criteria, and utilise that view as the data source in SSIS?
Or is there an even better way of doing this? You help would be much appreciated!
Thanks
Why don't you use a WHERE clause to filter the records, instead of creating a view? May be I am not getting your question correctly.
Then in general, bringing all the data to SSIS and then filtering out is not recommended. Especially when you can do the filtering at the source DB end itself. Consider the network bandwidth costs as well.
Then this particular filter that you are talking about here, cannot be done with a better efficiency in SSIS than that can be done at DB. Hence better do it in the Oracle DB itself.
You can use a query using openrowset as the source for the dataflow instead of directly accessing the Oracle table.
I am tasked with exporting the data contained inside a MaxDB database to SQL Server 200x. I was wondering if anyone has gone through this before and what your process was.
Here is my idea but its not automated.
1) Export data from MaxDB for each table as a CSV.
2) Clean the CSV to remove ? (which it uses for nulls) and fix the date strings.
3) Use SSIS to import the data into tables in SQL Server.
I was wondering if anyone has tried linking MaxDB to SQL Server or what other suggestions or ideas you have for automating this.
Thanks.
AboutDev.
I managed to find a solution to this. There is an open source MaxDB library that will allow you to connect to it through .Net much like the SQL provider. You can use that to get schema information and data, then write a little code to generate scripts to run in SQL Server to create tables and insert the data.
MaxDb Data Provider for ADO.NET
If this is a one time thing, you don't have to have it all automated.
I'd pull the CSVs into SQL Server tables, and keep them forever, will help with any questions a year from now. You can prefix them all the same, "Conversion_" or whatever. There are no constraints or FKs on these tables. You might consider using varchar for every column (or the ones that cause problems, or not at all if the data is clean), just to be sure there are no data type conversion issues.
pull the data from these conversion tables into the proper final tables. I'd use a single conversion stored procedure to do everything (but I like tsql). If the data isn't that large millions and millions of rows or less, just loop through and build out all the tables, printing log info as necessary, or inserting into exception/bad data tables as necessary.