How to run multiple stored procedures in parallel using ssis? - sql-server

We have a list of stored procedures (more than 1000) in a table which need to be executed every morning.
The stored procedures do not have any dependency with each other.
We have tried while loop and cursor it used to takes a lot of time in execution.
We taught of creating job for each stored procedure and call them using sp_start_job (sp_start_job is called in async manner) we got level of parallelism.
Problem arise when a new stored procedure is added to list and it became huge.
some time ppl missed to create job related new stored procedure
DB got bombarded with a no of jobs (manageability issue for DBA)
Note: list of may altered any day (stored procedures can be added or removed from list).

If the SPs run for longer, I would have categorized the 1000 SPs into 5-10 numbers, then 1 SSIS package for each category and then Agent Jobs for each package. Then, schedule those jobs at same time.
There are many ways like Loops, Scripting and multiple factors to achieve it. You can test with different ways and go with the best one.
Note: Performance of the SSIS execution depends on your Memory, Processor and Hardware.

Adding to # Nick.MacDermaid - you can utilize MaxConcurrentExecutables property of package to implement custom parallelism. Of course you would need to have multiple containers and corresponding stored proc groups.
Parallel Execution in SSIS
MaxConcurrentExecutables, a property of the package. It defines how
many tasks (executables) can run simultaneously. It defaults to -1
which is translated to the number of processors plus 2. Please note
that if your box has hyperthreading turned on, it is the logical
processor rather than the physically present processor that is
counted.

Hi you can use the following piece of code to get basically write the script of running all your stored procedures if you add a new procedure it will automatically be added to the list
SELECT'EXEC '+SPECIFIC_NAME [Command] + ';'
FROM information_schema.routines
WHERE routine_type = 'PROCEDURE'
After this you take the result set put it into a tab delimited text file and save the file in a location.
use this link to import the text into a execute SQL task the first answer works well
SSIS: How do I pull a SQL statement from a file into a string variable?
execute the task and it should work, if you need to narrow the ;ist of procedures you can specify a specific prefix in the name of the procedure and use that in the where clause
It will run in serial, sorry i dont have enough rep to comment yet

Related

Is there an alternative to SET STATISTICS TIME which also shows the statements?

SET STATISTICS TIME statement is only useful while developing as with it one can performance tune additional statement being added to the query or UDF/SP being worked on. However when one has to performance tune existing code, e.g. a SP with hundreds or thousands of lines of code, the output of this statement is pretty totally useless as it is not clear which to which SQL-statement the recorded times belong to.
Isn't there any alternatives to SET STATISTICS TIME which also show the Statements to which the recorded times belong to?
I would recommend to use advanced tool. Here is example of one call of sp with all and every internal details. On the right you have different runs history which can be commented and analyzed later. All you need for stats/index usage/io/waits - everything available on different tabs. Util: SentryOne Plan Explorer (free).
If your Stored Procedures are granular then you could use this DMV to get an idea of times.
SELECT
DB_NAME(qs.database_id) AS DBName
,qs.database_id
,qs.object_id
,OBJECT_NAME(qs.object_id,qs.database_id) AS ObjectName
,qs.cached_time
,qs.last_execution_time
,qs.plan_handle
,qs.execution_count
,total_worker_time
,last_worker_time
,min_worker_time
,max_worker_time
,total_physical_reads
,last_physical_reads
,min_physical_reads
,max_physical_reads
,total_logical_writes
,last_logical_writes
,min_logical_writes
,max_logical_writes
,total_logical_reads
,last_logical_reads
,min_logical_reads
,max_logical_reads
,total_elapsed_time
,last_elapsed_time
,min_elapsed_time
,max_elapsed_time
FROM
sys.dm_exec_procedure_stats qs
I'd create an extended events session similar to the one below:
CREATE EVENT SESSION [proc_statments] ON SERVER
ADD EVENT sqlserver.module_end(
WHERE ([object_name]=N'usp_foobar')
),
ADD EVENT sqlserver.sp_statement_completed(
SET collect_object_name=(1),collect_statement=(1)
WHERE ([object_name]=N'usp_foobar'))
ADD TARGET package0.event_file(SET filename=N'proc_statments')
WITH (TRACK_CAUSALITY=ON)
GO
This tracks both stored procedure and stored procedure statement completion for a procedure called usp_foobar. Within the event itself, there's an identifier that helps you tie together which statements were executed as a result of having executed a specific procedure (that's what the TRACK_CAUSALITY is for).

how to output multiple files in 1 stored procedure via SSIS

I am working in SQL Server 2008 and BIDS. Due to some performance problems, I am re-designing my current architecture. Currently, I have a stored procedure that has many INSERT INTO SELECT statements inside of it. In my new architecture, I am trying to get the performance of SSIS for inserts (instead of INSERT INTO in SSMS). So, my new stored proc will still have all of the SELECT statements (just no INSERT INTO before each of them). I will call this stored proc in SSIS (with a few parameters supplied that are needed by the SELECTs). My goal is to have each SELECT write to separate flat files. (Actually, certain groups of SELECTS will write to separate flat files, such that I have just a few -- instead of a billion -- flat file connection managers.) I know how to execute a stored proc in SISS and have it write a multiple-row set to a flat file. But, is it possible to have the execution of 1 stored proc in SSIS to write several multiple-row sets to several flat files? If so, how can it be done?
You can have one stored proc write to as many files as you want. Please look at this article by Phil Factor, https://www.simple-talk.com/sql/t-sql-programming/reading-and-writing-files-in-sql-server-using-t-sql/
However you are loosing all the power of SSIS - such as redirection on error rows, logging, parrallel processing. What you need to do sounds like a perfect SSIS task (or series of tasks).
Using Data Flow for Dynamic Export is not possible due to Strict Metadata Architecture of SSIS. But you can do it using Control Flow task. You have to write BCP command in Execute Process Task and call it for each table you want to export.
Steps:
Call select * from information_schema.tables and grab result set into variable
Use foreach Loop task to loop through tables
Use execute process task to call BCP in your loop.

Run automatically generated scripts parallel in sql server

I am creating a process that automates testing the consistency in database tables across servers.
I have a test_master table which contains following columns:
test_id, test_name, test_status
and Job_master table which contains following columns:
jid, test_id, job_name, job_type, job_path, job_status,
server_ip, db, error_description, op_table, test_table,
copy_status, check_status
There can be multiple jobs for a particular test. The jobs are logical jobs (and not sql agent jobs), they can be script, procedure or ssis package.
So I have made an ssis package :
In Pre-execute, it takes up tests which aren't done yet.
Each Job runs and writes the name of live table into op_table field
In post-execute, the live tables are getting copied to a test database environment and table name is put into test_table.. and testing will be performed there only.
Here the jobs are running in a loop... Is there a way to let the jobs run in parallel because they are independent of each other....
Can I write an sql procedure for this inside of this loop or is there any other way I can do this..
Any new ideas are welcome...
Thank you very much.. :)
Very roughly, I would put the approach as below:
SQL bits
Wrap whatever SQL code is part of "job" into a stored proc. Inside this proc, populate a variable which takes care of the SQL bit and execute it using dynamic SQL. Update the job status in the same proc and take help of TRY-CATCH-THROW construct.
Packages
Populate the name of packages in an SSIS string variable in delimited fashion(or have an object variable, whatever suits you). Then, in a script task, iterate through the list of packages and fire them using dtExec command. To update the job status, it's best to have the update of job status taken care by the invoked packages. If that is not an option, use Try-catch construct, update the job statuses according. This is a helpful link.
Do a check on the job_type variable on top of the SSIS package(using precedence constraint) and route them into the correct 'block'.

Inline SQL versus stored procedure

I have a simple SELECT statement with a couple columns referenced in the WHERE clause. Normally I do these simple ones in the VB code (setup a Command object, set Command Type to text, set Command Text to the Select statement). However I'm seeing timeout problems. We've optimized just about everything we can with our tables, etc.
I'm wondering if there'd be a big performance hit just because I'm doing the query this way, versus creating a simple stored procedure with a couple params. I'm thinking maybe the inline code forces SQL to do extra work compiling, creating query plan, etc. which wouldn't occur if I used a stored procedure.
An example of the actual SQL being run:
SELECT TOP 1 * FROM MyTable WHERE Field1 = #Field1 ORDER BY ID DESC
A well formed "inline" or "ad-hoc" SQL query - if properly used with parameters - is just as good as a stored procedure.
But this is absolutely crucial: you must use properly parametrized queries! If you don't - if you concatenate together your SQL for each request - then you don't benefit from these points...
Just like with a stored procedure, upon first executing, a query execution plan must be found - and then that execution plan is cached in the plan cache - just like with a stored procedure.
That query plan is reused over and over again, if you call your inline parametrized SQL statement multiple times - and the "inline" SQL query plan is subject to the same cache eviction policies as the execution plan of a stored procedure.
Just from that point of view - if you really use properly parametrized queries - there's no performance benefit for a stored procedure.
Stored procedures have other benefits (like being a "security boundary" etc.), but just raw performance isn't one of their major plus points.
It is true that the db has to do the extra work you mention, but that should not result in a big performance hit (unless you are running the query very, very frequently..)
Use sql profiler to see what is actually getting sent to the server. Use activity monitor to see if there are other queries blocking yours.
Your query couldn't be simpler. Is Field1 indexed? As others have said, there is no performance hit associated with "ad-hoc" queries.
For where to put your queries, this is one of the oldest debates in tech. I would argue that your requests "belong" to your application. They will be versionned with your app, tested with your app and should disappear when your app disappears. Putting them anywhere other than in your app is walking into a world of pain. But for goodness sake, use .sql files, compiled as embedded resources.
Select statement which is part of form clause of any
another statement is called as inline query.
Cannot take parameters.
Not a database object
Procedure:
Can take paramters
Database object
can be used globally if same action needs to be performed.

SQL Cursor w/Stored Procedure versus Query with UDF

I'm trying to optimize a stored procedure I'm maintaining, and am wondering if anyone can clue me in to the performance benefits/penalities of the options below. For my solution, I basically need to run a conversion program on an image stored in an IMAGE column in a table. The conversion process lives in an external .EXE file. Here are my options:
Pull the results of the target table into a temporary table, and then use a cursor to go over each row in the table and run a stored procedure on the IMAGE column. The stored proc calls out to the .EXE.
Create a UDF that calls the .EXE file, and run a SQL query similar to "select UDFNAME(Image_Col) from TargetTable".
I guess what I'm looking for is an idea of how much overhead would be added by the creation of the cursor, instead of doing it as a set?
Some additional info:
The size of the set in this case is max. 1000
As an answer mentions below, if done as a set with a UDF, will that mean that the external program is opened 1000 times all at once? Or are there optimizations in place for that? Obviously, on a multi-processor system, it may not be a bad thing to have multiple instances of the process running, but 1000 might be a bit much.
define set base in this context?
If you have 100 rows will this open up the app 100 times in one shot? I would say test and just because you can call an extended proc from a UDF I would still use a cursor for this because setbased doesn't matter in this case since you are not manipulating data in the tables directly
I did a little testing and experimenting, and when done in a UDF, it does indeed process each row at a time - SQL server doesn't run 100 processes for each of the 100 rows (I didn't think it would).
However, I still believe that doing this as a UDF instead of as a cursor would be better, because my research tends to show that the extra overhead of having to pull the data out in the cursor would slow things down. It may not make a huge difference, but it might save time versus pulling all of the data out into a temporary table first.

Resources