SSIS union all vs sql server Union All - sql-server

I use SQL Server 2012 and SSIS. I have two Tables in the same server and same database. I need to transfer all records of both tables into third table.
I need to add some columns (Like Execution ID and some Package Parameters) to result of UNION ALL and after that I have to transfer the records into third table.
I have two solution for doing that but I don't know which ones is more efficient.
Solution 1 : Use two OLE DB DataSources and use Union All Component in SSIS
Solution 2 : Use Union All in SQL Server side and use just one OLE DB Source in SSIS.
Which one is more efficient?

Always favor database operations when possible. If 2 tables are in the same database, there is absolutely no reason to favor an SSIS operation over the query optimizer.
Union All is a no blocking operation, so there will be almost no difference in this case, but if this was a join or a more complex operation then the query optimizer would come into play.
Use the database solution as a rule of thumb.

You can also consider doing it in simple Execute SQL task.
Insert into T3
select * from T1
union all
select * from T2

Sorry to disagree with Cafe Con Leche but once you are comfortable with SSIS you should always use an SSIS task over TSQL. Except for sort operations SSIS is usually faster and provides very easy to set up error handling (including 'bad' rows being redirected to be processed/fixed later) and logging. Almost anything that can be done in SSIS can be done in TSQL (especially if you are comfortable using sp_cmdshell) but its much harder to do logging and error handling in TSQL that is built into SSIS.

Related

How can I query two separate non-linked SQL Servers?

My goal is to pull data from 5 tables into one resultset, using a UNION Query.
Problem is that my tables are distributed across two separate servers (SQL Server v11.0 and SQL Server v13.0). They are not linked, they cannot be linked, and they have no relationship whatsoever.
Is there anyway to do that?
That is not going to happen in 11 and 13. But if it's really just a union you need, import to a staging area using bcp or your favorite ETL tool; if staging is in one or the other servers then you can union right there and save the transfer of any duplicates that would have been removed from the union (assuming we want a union and not union all).
You can try througt ACCESS. You can connect ACCESS with ODBC to both server, do a union in ACCESS and then upload the result where you want.
But this depend on the dimensions of the table of course.

What is the equivalent of 'SELECT * INTO' in SSIS

I am building a SSIS package in which package i need to transfer from
an odata source some tables into sql server.
So far i have implement an "insert into" query to the sql server from the tables i read from odata Source. Because the number of tables are 10+ is there a way that i can do "select into" query for faster transfer of those tables in SSIS ?
SSIS has no build in operation to create a table on a destination based on a data set, which is what SELECT ... INTO does.
There is no easy tweak to do this either, SSIS is mostly based for static metadata ETLs, that is performing operations between different sources and destinations with consistent structures and data types. You might achieve what you need with custom scripts, but that would be as well completely outside of SSIS.
If you already know the data you will be inserting into, create the destination tables first (with CREATE TABLE) and then use SSIS to map the corresponding columns. If your destination tables will be dynamic then you will have a hard time using regular SSIS operations to match the metadata of each table, since this is set at design time.
If the problem isn't the table's column data type but the speed of the operation (SELECT ... INTO has minimal logging), then the fastest option is using the bulk insert operation on the destination component when working with SQL Server. It will be faster than regular inserts, but usually slower than performing a SELECT ... INTO directly from SQL.

SSIS 14 - Staging Area - Merge two sources is taking a lot of time

I've two tables:
Table A: 631 476 rows
Table B: 12 90 rows
Eache Table have the Field ID that I want to use it as Key in Merge Object. In the following image is possible to see that the process blocks before the Merge Object. I already test with Merge Join object and results are the same...
Which other possibilities I have in order to make this operation using SSIS 14?
Thanks!
If both sources tables are in the same server, Don't use this way. You should simply write an query in SQL Server side.
Something like this :
SELECT *
FROM [Table A]
INNER JOIN [Table B] ON [Table A].ID = [Table B].ID
ORDER BY ...
As James Serra said : When to use T-SQL or SSIS for ETL
Performance – With T-SQL, everything is processed within the SQL engine. With SSIS, you are bringing all the data over to the SSIS memory space and doing the manipulation there. So if speed is an issue, usually T-SQL is the way to go, especially when dealing with a lot of records. Something like a JOIN statement in T-SQL will go much faster than using lookup tasks in SSIS. Another example is a MERGE statement in T-SQL has much better performance than a SCD task in SSIS for large tasks
Features/capabilities – Some features can only be done in either T-SQL or SSIS. You can shred text in SSIS, but can’t in T-SQL. For example, text files with an inconsistent number of fields per row can only be done in SSIS. So certain tasks may force you into using one or the other
Current skill set – Are the people in your IT department more familiar with SSIS or T-SQL?
Ease of development/maintenance – Of course, whatever one you are most familiar with will be the easiest, but if your skills at both are fairly even, then SSIS is usually easier to use because it is graphical, but sometimes you can develop quicker in T-SQL. For example, having to join a bunch of tables will require a bunch of tasks in SSIS, where in T-SQL it is one statement. So it might be easier to create the tasks to join the tables in SSIS, but it will take longer to build then writing a T-SQL statement
Complexity – SSIS can be more complex because you might need to create many tasks to accomplish your objective, where in T-SQL it might just be one statement, like in the example above for joining tables
Extensibility – SSIS has better extensibility because you can create a script task that uses C# that can do just about anything, especially for non-database related tasks. T-SQL is limited because it is only for database tasks. SSIS also has logging, which T-SQL does not
Likelihood of depracation/breaking changes – Minor issue, but T-SQL is always removing features in each release that will have to be rewritten
Types/architecture of sources and destinations – SSIS is better if you have multiple types of sources. For example, it works really well with Oracle, XML, flat-files, etc. SSIS was designed from the beginning to work well with other sources, where T-SQL is designed for SQL Server and it requires more steps to access other sources, and there are additional limitations when doing so
Local regulations – Are there some company standards you have to adhere to that would limit which tool you can use?
I have had issues doing joins or merges in SSIS. I will instead write the TSQL version and execute SQL task. It always runs much faster for me that way.

Best way to perform distributed SQL query and joins, calling from .Net code

Here's my scenario:
I have to query two PeopleSoft Databases on different servers (both are SQL Server 2000) and do a join of the data. My application is a .Net application (BizTalk).
I'm wondering what the best option is with regards to performance?
use standard select queries to get data
and do the join in memory (e.g. LINQ) for example
generated complex dynamic queries using LINKED Server, e.g.
select blah
from Server1.HRDB.dbo.MyTable1
left join Server2.FinanceDb.dbo.MyTable2
use standard select queries to get the data into an intermediate / staging sql server database and do my queries / joins on this database instead.
should I consider using SSIS? ( are there features here that might be better than doing an in-memory, e.g. LINQ? )
I wish I could use stored procedures on the source database, but the owners of the PeopleSoft database refuse it
The main constraints we have is that the source database is old (SQL Server 2000) and that performance of the source database is paramount. Whatever queries I run on this server must not block the other users. Hence, the DBAs are adamant about no Stored Procedures. They also believe that queries involving Linked Servers will trump (i.e. take higher priority) to other queries being run against the the database.
Any feedback would be greatly appreciated.
Thanks!
Update: additional background information on the project
We are primarily integrating PeopleSoft databases (the HR and Finance) into another product. Some are simple - like AccountCode and Department. Others are more complex, like the personal data, job, and leave accrual. Some are real-time, other's are scheduled, and other's are 'batch' (e.g. at payroll runs).
Regardless, we have to get source data out of PeopleSoft database -- and my hope had been to let the (source) database do the 'heavy' lifting by executing SQL Queries. I don't really want BizTalk, or SSIS, or C# LINQ to be the ones doing the transformations/filtering.
Definitely open to suggestions.

How do I join two tables from two different databases?

Is there any way to use a query and join two tables that is in two different database on the same server for DbVisualizer? I used the following for the SQL server
Select * from table union select * from datbase.dbo.table2
I tried this for the DbVisualizer, and it didnt work. How do I do this?
If the databases are in different servers you need to make sure that they are set up as linked servers.
Also be warned that the optimizer is relatively weak in this scenario, same server or not. The problem is that the statistics used for weighting costs of different operations aren't necessarily meaningful between different databases, especially at the point where the two databases will "intersect". So performance isn't what it could be.
If DBVisualizer supports views, manually setup a view of table2 in your database.
create view table2 as select * from database.dbo.table2
I dont think it can be done. I resolved the situation, by running a nightly data transfer to the SQL server. I do the union select from there...

Resources