SSIS data flow task or store procedure? - sql-server

Which is the faster way to load data from a table in one server to another table present in a different server? Data flow task or a stored procedure in an Execute SQL Task? Say, the table has around 100 million records and has necessary indexing.
Where I work, stored procedures are used and I would like to better the execution time taken. I was just wondering whether changing it to DFT could make it any faster. And, in some cases there are a lot of JOINS involved.So, on a general aspect which can offer better performance (regardless of the table structure or execution plan)?
Any help is appreciated.

Related

SQL Server 2014 - what is the most efficient way to move data from one table to another across databases same instance

I've found various opinions on this one, including a reference to an article that indicates that 'select into' operators can run in parallel from 2014+ and may or may not be more efficient than 'insert' as a result.
My use case is moving data from one table to another identical table across databases, same instance, 2014. The inserts will be 5-10M rows-ish, and I don't care about logging just efficiency. I need a general recommendation, not a case-by-case analysis.
I realize that there are other factors (row length, etc) that might affect the answer, but I'm looking for the best place to start. I can always try other methods if necessary.
So what's the most efficient way to load a table in one database from an identical table in another?
Thanks in advance!
I would suggest a SSIS (SQL Server Integration Services) package that performs BULK operations. Although 5M rows isn't significant in our current world.
Since "it depends" you'll have to help us understand what you're trying to save. INSERT INTO is nice only in that it is self contained and "easy." If this is a one time deal you might do it this way and stop thinking about it.
If however you're going to be shoveling 10M records daily - you might consider a scheduled SSIS script. There is overhead to maintaining the script but it is generally faster. If you are reloading data for testing purposes (reset to baseline) then the SSIS package is a good way to go.
You might also look at this article: https://dba.stackexchange.com/questions/99367/insert-into-table-select-from-table-vs-bulk-insert

In SQL Server, should I create synonym for a table or a stored procedure?

If this has been answered elsewhere, please post a link to it, yell at me, and close this question. I looked around and saw similar things, but didn't find exactly what I was looking for.
I am currently writing several stored procedures that require data from another database. That database could be on another server or the same server, it just depends on the customer's network. I want to use a Synonym so that if the location of the table that I need data from changes, I can update the synonym once and not have to go back in to all of the stored procedures and update their references.
What I want to know is what the best approach is with a synonym. I read a post on SO before that said there was a performance hit when using a view or table (especially across a linked server). This may be due to SQL Server's ability to recognize indexes on tables when using synonyms. I can't find that post anymore or I would post a link to it. It was suggested that the best approach is to create a synonym for a stored procedure, and load the resulting data in to a memory or temp table.
I may not have my facts straight on that, though, and was hoping for some clarification. From what I can tell, creating and loading data in to memory tables generally accounts for a large percentage of the execution plan. Is using a stored procedure worth the extra effort of loading the data in to a table over just being able to run queries against a view or table? What is the most efficient way to get data from another database using a synonym?
Thanks!
Synonyms are just defined alias's to make redirection easier, they have no performance impact worth considering. And yes, they are advised for redirection, they do make it a lot easier.
What a synonym points to on the other hand can have a significant performance impact (this has nothing to do with the synonym itself).
Using tables and views in other databases on the same server-instance has a small impact. I've heard 10% quoted and I can fairly say that I have never observed it to be higher than that. This impact is mostly from reductions in the optimizers efficiency, as far as I can tell.
Using objects on other server-instances, whether through linked server definitions, or OpenQuery is another story entirely. These tend to be much slower, primarily because of the combined effects of MS DTC and the optimizer deciding to do almost no optimizations for the remote aspects of a query. This tends to be bearable for small queries and small remote tables, but increasingly awful the bigger the query and/or remote table is.
Most practitioners eventually decide on one of two fixes for this problem, either 1) If it is a table, then just copy the remote table rows to a local #temp table first and then query on that, or, 2) if it is more complex, then write a stored procedure on the remote server and then execute it with INSERT INTO..EXECUTE AT, to retrieve the remote info.
As for how to use/organize your synonyms, my advice would be to create a separate owner-schema in your database (with an appropriate name like [Remote]) and then put all of your Synonyms there. Then when you need to redirect, you can write a stored procedure that will automatically find all of the synonyms pointing to the old location and change them to the new location (this is how I do it). Makes it a lot easier to deal with location/name changes.
Choosing Option 1 or 2 depends on the nature of your query. If you can retrieve the data with a relatively simple Select with a good Where clause to restrict the number of rows then Option 1 is generally the best choice. DO NOT JOIN local and remote tables. Pull the remote data to a local #temp table and join the local tables on that temp table in a separate query.
If the query is more complex with multiple joins and/or complex Where conditions then retrieving the data into a local #temp table via a Remote procedure call is generally the best choice. Again, DO NOT JOIN local and remote procedures and minimize the number/size of the parameters to the remote procedure.
The balance point between "simple Select" and "complex Select" is a matter of knowing you data and testing.
HTH :)

Choosing a clever solution: SQL Server or file processing for bulk data?

We have a number of files generated from a test, with each file having almost 60,000 lines of data. Requirement is to calculate number of parameters with the help of data present in these files. There could be two ways of processing the data :
Each file is read line-by-line and processed to obtain required parameters
The file data is bulk copied into the database tables and required parameters are calculated with the help of aggregate functions in the stored procedure.
I was trying to figure out the overheads related to both the methods. As a database is meant to handle such situations, I am concerned with overheads which may be a problem when database grows larger.
Will it affect the retrieval rate from the tables, consequently making the calculations slower? Thus will file processing be a better solution taking into account the database size? Should database partitioning solve the problem for large database?
Did you consider using map-reduce (say under Hadoop maybe with HBase) to perform these tasks? If you're looking for high-throughput with big data volumes this is a very scaleable approach. Of course, not every problem can be addressed effectively using this paradigm and I don't know the details of your calculation.
If you set up indexes correctly you won't suffer performance issues. Additionally, there is nothing stopping you loading the files into a table and running the calculations and then moving the data into an archive table or deleting it altogether.
you can run a querty directly agianst the text file from SQL
SELECT * FROM OPENROWSET('MSDASQL',
'Driver={Microsoft Text Driver (*.txt; *.csv)};DefaultDir=C:\;',
'SELECT * FROM [text.txt];')
The distributed queries needs to be enabled to run this.
Or as you mentioned you can load the data data to a table (using SSIS, BCP, the query above ..). You did not mentioned what does it mean that the database will be larger. 60k of lines for a table is not so much (meaning that it will perform well).

Direct Sql or combine it in a procedure? which is more efficient

in my recent subject ,I have to do some queries through dynamic SQL,But I'm curious about
the efficiency in different ways:
1)combine the sql sentences in my server and then send them to the database ,do the query
2)send my variables to database and combine them in a certain procedure and finally do the query
Hope someone can help
BTW(I use .Net and Sqlserver)
Firstly, one of the main things you should do is to parameterise your SQL - whether that be by wrapping it up as a stored procedure in the DB, or by creating the SQL statement in your application code and then firing the whole thing in to the DB. This will mean:
prevention against SQL injection attacks by not directly concatenating user-entered values into a SQL statement
execution plan reuse (subsequent executions of that query, regardless of parameter values, will be able to reuse the original execution plan) (NB. this could be done if not parameterised yourself, via Forced Parameterisation)
Stored procedures do offer some extra advantages:
security ,only need to grant EXECUTE permissions to the stored procedures, you don't need to grant the user direct access to underlying db tables
maintainability, a change to a query does not involve an application code change, you can just change the sproc in the DB
network traffic, not necessarily a major point but you're sending less over the wire especially if the query is pretty large/complex
Personally, I use stored procedures most of the time. Though the times I need to build up SQL dynamically in application code, it is always parameterised.
Best is to use stored procedure and pass parameters from your application, as Stored procedures are precompiled queries and have execution plan ready which saves lot of time.
You can refer this url which has details http://mukund.wordpress.com/2005/10/14/advantages-and-disadvantages-of-stored-procedure/
Happy coding!!

Saving / Caching Stored Procedure results for better performance? (SQL Server 2005)

I have a SP that has been worked on my 2 people now that still takes 2 minutes or more to run. Is there a way to have these pre run and stored in cache or somewhere else so when my client needs to look at this data in a web browser he doesn't want to hang himself or me?
I am no where near a DBA so I am kind of at the mercy of who I hire to figure this out for me, so having a little knowledge up front would really help me out.
If it truly takes that long to run, you could schedule the process to run using SQL Agent, and have the output go to a table, then change the web application to read the table rather than execute the stored procedure. You'd have to decide how often to run the refresh, and deal with the requests that occur while it is being refreshed, but that can be dealt with as well by having two output files, one live and one for the latest refresh.
But I would take another look at the procedure, look at the execution plan and see where it is slow, make sure it is not doing full table scans.
Preferred solutions in this order:
Analyze the query and optimize accordingly
Cache it in the application (you can use httpRuntime.Cache (even if not asp.net application)
Cache SPROC results in a table in the DB and then add triggers to invalidate the cache (delete the table) so a a call to the SPROC would first look to see if there is any data in the cache table. If none, run SPROC and store the result in the cache table, if so, return the data from that table. The triggers on the "source" tables for the SPROC would just delete * from CacheTable to "clear the cache" (depending on what you sproc is doing and its dependencies, you may even be able to partially update the cache table based on the trigger, but all of this quickly gets difficult to maintain...but sometimes you gotta do what you gotta do...This approach will allow the cache table to update itself as needed. You will always have the latest data and the SPROC will only run when needed.
Try "Analyze query in database engine tuning advisor" from the Query menu.
I usually script the procedure to a new window, take out the query definition part and try different combinations of temp tables, regular tables and table variables.
You could cache the result set in the application as opposed to the database, either in memory by keeping an instance of the datatable around, or by serializing it to disk. How many rows does it return?
Is it too long to post the code here?
OK first things first, indexes:
What indexes do you have on the tables and is the execution plan using them?
Do you have indexes on all the foreign key fields?
Second, does the proc use any of the following performance killers:
a cursor
a subquery
a user-defined function
select *
a search criteria that starts with a wildcard
third
Can the where clause be rewritten to be sargeable? There is more than one way to write almost everything and some ways are better performers than others.
I suggest you buy your developers some books on performance tuning.
Likely your proc can be fixed, but without seeing the code, it is hard to guess what the problems might be.

Resources