I am working in SQL Server 2008 and BIDS. Due to some performance problems, I am re-designing my current architecture. Currently, I have a stored procedure that has many INSERT INTO SELECT statements inside of it. In my new architecture, I am trying to get the performance of SSIS for inserts (instead of INSERT INTO in SSMS). So, my new stored proc will still have all of the SELECT statements (just no INSERT INTO before each of them). I will call this stored proc in SSIS (with a few parameters supplied that are needed by the SELECTs). My goal is to have each SELECT write to separate flat files. (Actually, certain groups of SELECTS will write to separate flat files, such that I have just a few -- instead of a billion -- flat file connection managers.) I know how to execute a stored proc in SISS and have it write a multiple-row set to a flat file. But, is it possible to have the execution of 1 stored proc in SSIS to write several multiple-row sets to several flat files? If so, how can it be done?
You can have one stored proc write to as many files as you want. Please look at this article by Phil Factor, https://www.simple-talk.com/sql/t-sql-programming/reading-and-writing-files-in-sql-server-using-t-sql/
However you are loosing all the power of SSIS - such as redirection on error rows, logging, parrallel processing. What you need to do sounds like a perfect SSIS task (or series of tasks).
Using Data Flow for Dynamic Export is not possible due to Strict Metadata Architecture of SSIS. But you can do it using Control Flow task. You have to write BCP command in Execute Process Task and call it for each table you want to export.
Steps:
Call select * from information_schema.tables and grab result set into variable
Use foreach Loop task to loop through tables
Use execute process task to call BCP in your loop.
Related
I have a scenario in which there is one activity in my Azure Data Factory Pipeline. This activity copies data from history tables to archive tables. And a history table can have upto 600 million records. There is a SQL Server Stored Procedure(SP) in this activity which executes three child SPs using a while loop:
while i<3
exec proc
i = i + 1
The 3 SPs copy data from history table to archive table in SQL DW. This activity is common to 600 pipelines and different activities copy different number of tables.
But, while loop executes the child SPs one by one.
I tried searching for a way to parallelize the 3 SPs but found nothing in SQL Server.
I want to trigger all the child SPs at once. Is there anyway I can do this? Any solution in SQL Server, Data Factory,Python Script or Spark will suffice.
You cannot execute 3 child stored procedures parallelly inside a parent stored procedure. But you can execute the 3 child procedures directly without requiring any parent procedure.
Please follow the demonstration below where I executed 3 stored procedures in parallel using azure data factory (For each activity):
I have 3 stored procedures sp1, sp2 and sp3 that I want to execute in parallel. I created a parameter (Array) that holds the names of these stored procedures.
This parameter acts as items value (#pipeline().parameters.sp_names) in for each activity. Here, in for each activity, do not check sequential checkbox and specify batch value as 3.
Now inside the for each activity, create a stored procedure activity and create the necessary linked service. While selecting the stored procedure name check the edit box. Give dynamic content for stored procedure name as #item()
This procedure helps to run the stored procedures parallelly. Look at the outputs when the same is executed with For each activity having sequential execution and batch execution.
With Sequential execution:
With Batch execution:
we are currently running our first analytics prototype on Snowflake.
The objective is to create a comprehensive analysis result table that can be used for reporting based on ~60 structured raw data tables.
We created all the necessary SQL scripts using the built-in worksheet functionality. In total we wrote around 80 worksheets, each with 5-10 sql statements.
As a next step we would like to automate the execution of these worksheets in a simple, sequential order. However, Tasks and Stored Procedures, both built-in solutions we looked into, fail to execute more than one SQL-statement in a single call.
Multiple SQL statements in a single API call are not supported; use one API call per statement instead.
How are you guys handling this? Do we really have to write individual tasks/stored procedures for every single sql statement? In our case this would easily amass to more than 500 of those.
Very much interested in your input, thanks!
It is possible to run multiple statements using tasks when code is wrapped with BEGIN ... END block:
CREATE OR REPLACE TASK test_task
WAREHOUSE = COMPUTE_WH
AS
BEGIN
CREATE OR REPLACE TABLE test_tab(i INT);
INSERT INTO test_tab(i) VALUES (1);
INSERT INTO test_tab(i) SELECT i * 10 FROM test_tab;
END;
Call:
EXECUTE TASK test_task;
SELECT *
FROM TABLE(information_schema.task_history())
ORDER BY scheduled_time;
SELECT * FROM test_tab;
-- 1
-- 10
I think your problem is you're using the wrong tool ;) The worksheets are just not meant for batch processing, if you want to do that you should use the snowsql client:
https://docs.snowflake.com/en/user-guide/snowsql-use.html
You cannot use worksheet to create stored procedures. You need to use Javascript API (until SQL stored procedures are made available).
https://docs.snowflake.com/en/sql-reference/stored-procedures-usage.html
We have a list of stored procedures (more than 1000) in a table which need to be executed every morning.
The stored procedures do not have any dependency with each other.
We have tried while loop and cursor it used to takes a lot of time in execution.
We taught of creating job for each stored procedure and call them using sp_start_job (sp_start_job is called in async manner) we got level of parallelism.
Problem arise when a new stored procedure is added to list and it became huge.
some time ppl missed to create job related new stored procedure
DB got bombarded with a no of jobs (manageability issue for DBA)
Note: list of may altered any day (stored procedures can be added or removed from list).
If the SPs run for longer, I would have categorized the 1000 SPs into 5-10 numbers, then 1 SSIS package for each category and then Agent Jobs for each package. Then, schedule those jobs at same time.
There are many ways like Loops, Scripting and multiple factors to achieve it. You can test with different ways and go with the best one.
Note: Performance of the SSIS execution depends on your Memory, Processor and Hardware.
Adding to # Nick.MacDermaid - you can utilize MaxConcurrentExecutables property of package to implement custom parallelism. Of course you would need to have multiple containers and corresponding stored proc groups.
Parallel Execution in SSIS
MaxConcurrentExecutables, a property of the package. It defines how
many tasks (executables) can run simultaneously. It defaults to -1
which is translated to the number of processors plus 2. Please note
that if your box has hyperthreading turned on, it is the logical
processor rather than the physically present processor that is
counted.
Hi you can use the following piece of code to get basically write the script of running all your stored procedures if you add a new procedure it will automatically be added to the list
SELECT'EXEC '+SPECIFIC_NAME [Command] + ';'
FROM information_schema.routines
WHERE routine_type = 'PROCEDURE'
After this you take the result set put it into a tab delimited text file and save the file in a location.
use this link to import the text into a execute SQL task the first answer works well
SSIS: How do I pull a SQL statement from a file into a string variable?
execute the task and it should work, if you need to narrow the ;ist of procedures you can specify a specific prefix in the name of the procedure and use that in the where clause
It will run in serial, sorry i dont have enough rep to comment yet
I am creating a process that automates testing the consistency in database tables across servers.
I have a test_master table which contains following columns:
test_id, test_name, test_status
and Job_master table which contains following columns:
jid, test_id, job_name, job_type, job_path, job_status,
server_ip, db, error_description, op_table, test_table,
copy_status, check_status
There can be multiple jobs for a particular test. The jobs are logical jobs (and not sql agent jobs), they can be script, procedure or ssis package.
So I have made an ssis package :
In Pre-execute, it takes up tests which aren't done yet.
Each Job runs and writes the name of live table into op_table field
In post-execute, the live tables are getting copied to a test database environment and table name is put into test_table.. and testing will be performed there only.
Here the jobs are running in a loop... Is there a way to let the jobs run in parallel because they are independent of each other....
Can I write an sql procedure for this inside of this loop or is there any other way I can do this..
Any new ideas are welcome...
Thank you very much.. :)
Very roughly, I would put the approach as below:
SQL bits
Wrap whatever SQL code is part of "job" into a stored proc. Inside this proc, populate a variable which takes care of the SQL bit and execute it using dynamic SQL. Update the job status in the same proc and take help of TRY-CATCH-THROW construct.
Packages
Populate the name of packages in an SSIS string variable in delimited fashion(or have an object variable, whatever suits you). Then, in a script task, iterate through the list of packages and fire them using dtExec command. To update the job status, it's best to have the update of job status taken care by the invoked packages. If that is not an option, use Try-catch construct, update the job statuses according. This is a helpful link.
Do a check on the job_type variable on top of the SSIS package(using precedence constraint) and route them into the correct 'block'.
We have the below requirement,
A large text file of size 44GB containing insert scripts for a table is given. We need to execute these scripts against target SQL server 2008 R2 database. We followed 2 step process to execute the scripts.
1. Bulk inserted all the insert statements into intermeditate table one by one(approx 22 million records).
2. Then executed the statements in the intermediate table using a cursor.
The first step is succeeding, however the second step is not so effective as it is slow and a few insert statements fail in the middle of execution. We are unable to locate the exact point of failure. Could you please let us know an effective way of accomplishing the task.
Using a cursor is generally not recommended due to being slow and a memory hog. Try using a WHILE loop instead?
Reference example:
SQL Server stored procedure avoid cursor