Snowflake orchestration of tasks - snowflake-cloud-data-platform

I have a batch load process that loads data into a staging database. I have a number of tasks that execute stored procedures which move the data to a different database. The tasks are executed when the SYSTEM$STREAM_HAS_DATA condition is satisfied on a given table.
I have a separate stored procedure that I want to execute only after the tasks have completed moving the data.
However, I have no way to know which tables will receive data and therefore do not know which tasks will be executed.
How can I know when all the tasks that satisfied the SYSTEM$STREAM_HAS_DATA condition are finished and I can now kick off the other stored procedure? Is there a way to orchestrate this step by step process similar to how you would in a SQL job?

There is no automated way but you can do it with some coding.
You may create a stored procedure to check the STATE column of the task_history view to see if the tasks are completed or skipped:
https://docs.snowflake.com/en/sql-reference/functions/task_history.html
You can call this stored procedure periodically using a task (like every 5 minutes etc).
Based on your checks inside of the stored procedure (all tasks were succeeded, the target SP wasn't executed today yet etc), you can execute your target stored procedure which needs to be executed after all tasks have been completed.

You can also check the status of all the streams via SELECT SYSTEM$STREAM_HAS_DATA('<stream_name>') FROM STREAM which does not process the stream, or SELECT COUNT(*) FROM STREAM.
Look into using IDENTIFIER for dynamic queries.

Related

How to execute sql server query statements in parallel?

I have a scenario in which there is one activity in my Azure Data Factory Pipeline. This activity copies data from history tables to archive tables. And a history table can have upto 600 million records. There is a SQL Server Stored Procedure(SP) in this activity which executes three child SPs using a while loop:
while i<3
exec proc
i = i + 1
The 3 SPs copy data from history table to archive table in SQL DW. This activity is common to 600 pipelines and different activities copy different number of tables.
But, while loop executes the child SPs one by one.
I tried searching for a way to parallelize the 3 SPs but found nothing in SQL Server.
I want to trigger all the child SPs at once. Is there anyway I can do this? Any solution in SQL Server, Data Factory,Python Script or Spark will suffice.
You cannot execute 3 child stored procedures parallelly inside a parent stored procedure. But you can execute the 3 child procedures directly without requiring any parent procedure.
Please follow the demonstration below where I executed 3 stored procedures in parallel using azure data factory (For each activity):
I have 3 stored procedures sp1, sp2 and sp3 that I want to execute in parallel. I created a parameter (Array) that holds the names of these stored procedures.
This parameter acts as items value (#pipeline().parameters.sp_names) in for each activity. Here, in for each activity, do not check sequential checkbox and specify batch value as 3.
Now inside the for each activity, create a stored procedure activity and create the necessary linked service. While selecting the stored procedure name check the edit box. Give dynamic content for stored procedure name as #item()
This procedure helps to run the stored procedures parallelly. Look at the outputs when the same is executed with For each activity having sequential execution and batch execution.
With Sequential execution:
With Batch execution:

How can I run multiple instances of a SQL Server stored procedure in parallel

I need to run a stored procedure with different parameters each time - there is no data overlap but some temp tables are being created and dropped inside the procedure.
I am calling it via ADF stored procedure activity which is being called in a for each loop.
For now - I am running it sequentially but I want to speed it up without any conflicts - hence want to parallelize it.
How can I keep it ACID compliant (non overlapping transactions) as well as run multiple instances of it in parallel at the same time?
The query is more around: will multiple instances of a proc be triggered if I do this .. and if yes.. how can I ensure one whole run of a proc is a single transaction that creates n drops temp tables within that transaction without impacting other parallel runs?
Please don't set the isSequential in for each active:

Task with multiple stored procedures

Is there a way to create a task within snowflake to call multiple stored procedures?
For example I have three stored procedures to check for duplicated information over multiple tables, I'd like to call all three through the task without having to create a new SP to loop through them all.
A task can only trigger one SQL statement or one Stored Procedure.
So you have to decide:
One task for each procedure with dependencies between the tasks
One task with a wrapper procedure that calls all the three Stored Procedures (the solution you do not want to have)
I think chaining the tasks is a good solution. You have to use the AFTER-clause within your CREATE TASK-statement to achieve the correct dependencies: https://docs.snowflake.com/en/sql-reference/sql/create-task.html
A task can only call 1 SP so if you don't want to write one SP that calls the others then how about creating a chain of 3 tasks?

How to run multiple stored procedures in parallel using ssis?

We have a list of stored procedures (more than 1000) in a table which need to be executed every morning.
The stored procedures do not have any dependency with each other.
We have tried while loop and cursor it used to takes a lot of time in execution.
We taught of creating job for each stored procedure and call them using sp_start_job (sp_start_job is called in async manner) we got level of parallelism.
Problem arise when a new stored procedure is added to list and it became huge.
some time ppl missed to create job related new stored procedure
DB got bombarded with a no of jobs (manageability issue for DBA)
Note: list of may altered any day (stored procedures can be added or removed from list).
If the SPs run for longer, I would have categorized the 1000 SPs into 5-10 numbers, then 1 SSIS package for each category and then Agent Jobs for each package. Then, schedule those jobs at same time.
There are many ways like Loops, Scripting and multiple factors to achieve it. You can test with different ways and go with the best one.
Note: Performance of the SSIS execution depends on your Memory, Processor and Hardware.
Adding to # Nick.MacDermaid - you can utilize MaxConcurrentExecutables property of package to implement custom parallelism. Of course you would need to have multiple containers and corresponding stored proc groups.
Parallel Execution in SSIS
MaxConcurrentExecutables, a property of the package. It defines how
many tasks (executables) can run simultaneously. It defaults to -1
which is translated to the number of processors plus 2. Please note
that if your box has hyperthreading turned on, it is the logical
processor rather than the physically present processor that is
counted.
Hi you can use the following piece of code to get basically write the script of running all your stored procedures if you add a new procedure it will automatically be added to the list
SELECT'EXEC '+SPECIFIC_NAME [Command] + ';'
FROM information_schema.routines
WHERE routine_type = 'PROCEDURE'
After this you take the result set put it into a tab delimited text file and save the file in a location.
use this link to import the text into a execute SQL task the first answer works well
SSIS: How do I pull a SQL statement from a file into a string variable?
execute the task and it should work, if you need to narrow the ;ist of procedures you can specify a specific prefix in the name of the procedure and use that in the where clause
It will run in serial, sorry i dont have enough rep to comment yet

Pass status information from Stored Procedure to caller inside transaction

I have a long-running SP (it can run for up to several minutes) that basically performs a number of cleanup operations on various tables within a transaction. I'm trying to determine the best way to somehow pass human-readable status information back to the caller on what step of the process the SP is currently performing.
Because the entire SP runs inside a single transaction, I can't write this information back to a status table and then read it from another thread unless I use NOLOCK to read it, which I consider a last resort since:
NOLOCK can cause other data inconsistency issues; and
this places the onus on anyone wanting to read the status table that they need to use NOLOCK because the table or row(s) could be locked for quite a while.
Is there any way to issue a single command (or EXEC a second SP) within a transaction and tell specify that that particular command shouldn't be part of the transaction? Or is there some other way for ADO.NET to gain insight into this long-running SP to see what it is currently doing?
You can PRINT messages in T-SQL and get them delivered to your SqlConnection in ADO.NET via the "InfoMessage" event. See
http://msdn.microsoft.com/en-us/library/a0hee08w.aspx
for details.
You could try using RAISERROR (use a severity of 10 or lower) within the procedure to return informational messages.
Example:
RAISERROR(N'Step 5 completed.', 10, 1) WITH NOWAIT;

Resources