I have a scenario in which there is one activity in my Azure Data Factory Pipeline. This activity copies data from history tables to archive tables. And a history table can have upto 600 million records. There is a SQL Server Stored Procedure(SP) in this activity which executes three child SPs using a while loop:
while i<3
exec proc
i = i + 1
The 3 SPs copy data from history table to archive table in SQL DW. This activity is common to 600 pipelines and different activities copy different number of tables.
But, while loop executes the child SPs one by one.
I tried searching for a way to parallelize the 3 SPs but found nothing in SQL Server.
I want to trigger all the child SPs at once. Is there anyway I can do this? Any solution in SQL Server, Data Factory,Python Script or Spark will suffice.
You cannot execute 3 child stored procedures parallelly inside a parent stored procedure. But you can execute the 3 child procedures directly without requiring any parent procedure.
Please follow the demonstration below where I executed 3 stored procedures in parallel using azure data factory (For each activity):
I have 3 stored procedures sp1, sp2 and sp3 that I want to execute in parallel. I created a parameter (Array) that holds the names of these stored procedures.
This parameter acts as items value (#pipeline().parameters.sp_names) in for each activity. Here, in for each activity, do not check sequential checkbox and specify batch value as 3.
Now inside the for each activity, create a stored procedure activity and create the necessary linked service. While selecting the stored procedure name check the edit box. Give dynamic content for stored procedure name as #item()
This procedure helps to run the stored procedures parallelly. Look at the outputs when the same is executed with For each activity having sequential execution and batch execution.
With Sequential execution:
With Batch execution:
Related
I have a batch load process that loads data into a staging database. I have a number of tasks that execute stored procedures which move the data to a different database. The tasks are executed when the SYSTEM$STREAM_HAS_DATA condition is satisfied on a given table.
I have a separate stored procedure that I want to execute only after the tasks have completed moving the data.
However, I have no way to know which tables will receive data and therefore do not know which tasks will be executed.
How can I know when all the tasks that satisfied the SYSTEM$STREAM_HAS_DATA condition are finished and I can now kick off the other stored procedure? Is there a way to orchestrate this step by step process similar to how you would in a SQL job?
There is no automated way but you can do it with some coding.
You may create a stored procedure to check the STATE column of the task_history view to see if the tasks are completed or skipped:
https://docs.snowflake.com/en/sql-reference/functions/task_history.html
You can call this stored procedure periodically using a task (like every 5 minutes etc).
Based on your checks inside of the stored procedure (all tasks were succeeded, the target SP wasn't executed today yet etc), you can execute your target stored procedure which needs to be executed after all tasks have been completed.
You can also check the status of all the streams via SELECT SYSTEM$STREAM_HAS_DATA('<stream_name>') FROM STREAM which does not process the stream, or SELECT COUNT(*) FROM STREAM.
Look into using IDENTIFIER for dynamic queries.
I need to run a stored procedure with different parameters each time - there is no data overlap but some temp tables are being created and dropped inside the procedure.
I am calling it via ADF stored procedure activity which is being called in a for each loop.
For now - I am running it sequentially but I want to speed it up without any conflicts - hence want to parallelize it.
How can I keep it ACID compliant (non overlapping transactions) as well as run multiple instances of it in parallel at the same time?
The query is more around: will multiple instances of a proc be triggered if I do this .. and if yes.. how can I ensure one whole run of a proc is a single transaction that creates n drops temp tables within that transaction without impacting other parallel runs?
Please don't set the isSequential in for each active:
I have tried finding the answer for my issue but there were no appropriate answers found.
We have a Java web application where the data is loaded on the launch screen based on the user roles.
The data on launch screen is fetched by executing a stored procedure which in turn returns a ResultSet and then data is processed from it to show on launch screen.
My Queries:
If multiple people launch the Web Application simultaneously- will the stored procedure get executed multiple times? Will the execution be done parallely on single instance of Database Stored Procedure (or) for every request a new instance of stored procedure is created by Database. Basically I am curious to know what happens behind the scenes in this scenario.
Note: Inside this SybaseASE Stored Procedure we use a lot of temporary tables into which data is inserted and removed based on several conditions. And based on roles different users will get different results.
What is the scope of temporary table with in Stored Procedure and as mentioned in point 1 if multiple requests parallely access the Stored Procedure what will be the impact on temporary tables.
And based on point 2 is there a chance of database blockage or Deadlock situations occurring because of temporary tables with in a Stored Procedure?
1: two executions of the same stored proc are completely independent (there may be some commonalities in terms of the query plan, but that does not affect the results)
2: see 1. temp tables are specific to the stored proc invocation and the user's session; temp tables are dropped automatically at the end of the proc (if you didn't drop them already).
3: there cannot be a locking/blocking issues on the temp tables themselves. But there can of course always locking/blocking issues on other tables being queried (for example, to populate the temp tables). Nothing special here.
I am working in SQL Server 2008 and BIDS. Due to some performance problems, I am re-designing my current architecture. Currently, I have a stored procedure that has many INSERT INTO SELECT statements inside of it. In my new architecture, I am trying to get the performance of SSIS for inserts (instead of INSERT INTO in SSMS). So, my new stored proc will still have all of the SELECT statements (just no INSERT INTO before each of them). I will call this stored proc in SSIS (with a few parameters supplied that are needed by the SELECTs). My goal is to have each SELECT write to separate flat files. (Actually, certain groups of SELECTS will write to separate flat files, such that I have just a few -- instead of a billion -- flat file connection managers.) I know how to execute a stored proc in SISS and have it write a multiple-row set to a flat file. But, is it possible to have the execution of 1 stored proc in SSIS to write several multiple-row sets to several flat files? If so, how can it be done?
You can have one stored proc write to as many files as you want. Please look at this article by Phil Factor, https://www.simple-talk.com/sql/t-sql-programming/reading-and-writing-files-in-sql-server-using-t-sql/
However you are loosing all the power of SSIS - such as redirection on error rows, logging, parrallel processing. What you need to do sounds like a perfect SSIS task (or series of tasks).
Using Data Flow for Dynamic Export is not possible due to Strict Metadata Architecture of SSIS. But you can do it using Control Flow task. You have to write BCP command in Execute Process Task and call it for each table you want to export.
Steps:
Call select * from information_schema.tables and grab result set into variable
Use foreach Loop task to loop through tables
Use execute process task to call BCP in your loop.
I am creating a process that automates testing the consistency in database tables across servers.
I have a test_master table which contains following columns:
test_id, test_name, test_status
and Job_master table which contains following columns:
jid, test_id, job_name, job_type, job_path, job_status,
server_ip, db, error_description, op_table, test_table,
copy_status, check_status
There can be multiple jobs for a particular test. The jobs are logical jobs (and not sql agent jobs), they can be script, procedure or ssis package.
So I have made an ssis package :
In Pre-execute, it takes up tests which aren't done yet.
Each Job runs and writes the name of live table into op_table field
In post-execute, the live tables are getting copied to a test database environment and table name is put into test_table.. and testing will be performed there only.
Here the jobs are running in a loop... Is there a way to let the jobs run in parallel because they are independent of each other....
Can I write an sql procedure for this inside of this loop or is there any other way I can do this..
Any new ideas are welcome...
Thank you very much.. :)
Very roughly, I would put the approach as below:
SQL bits
Wrap whatever SQL code is part of "job" into a stored proc. Inside this proc, populate a variable which takes care of the SQL bit and execute it using dynamic SQL. Update the job status in the same proc and take help of TRY-CATCH-THROW construct.
Packages
Populate the name of packages in an SSIS string variable in delimited fashion(or have an object variable, whatever suits you). Then, in a script task, iterate through the list of packages and fire them using dtExec command. To update the job status, it's best to have the update of job status taken care by the invoked packages. If that is not an option, use Try-catch construct, update the job statuses according. This is a helpful link.
Do a check on the job_type variable on top of the SSIS package(using precedence constraint) and route them into the correct 'block'.