Task with multiple stored procedures - snowflake-cloud-data-platform

Is there a way to create a task within snowflake to call multiple stored procedures?
For example I have three stored procedures to check for duplicated information over multiple tables, I'd like to call all three through the task without having to create a new SP to loop through them all.

A task can only trigger one SQL statement or one Stored Procedure.
So you have to decide:
One task for each procedure with dependencies between the tasks
One task with a wrapper procedure that calls all the three Stored Procedures (the solution you do not want to have)
I think chaining the tasks is a good solution. You have to use the AFTER-clause within your CREATE TASK-statement to achieve the correct dependencies: https://docs.snowflake.com/en/sql-reference/sql/create-task.html

A task can only call 1 SP so if you don't want to write one SP that calls the others then how about creating a chain of 3 tasks?

Related

How to execute sql server query statements in parallel?

I have a scenario in which there is one activity in my Azure Data Factory Pipeline. This activity copies data from history tables to archive tables. And a history table can have upto 600 million records. There is a SQL Server Stored Procedure(SP) in this activity which executes three child SPs using a while loop:
while i<3
exec proc
i = i + 1
The 3 SPs copy data from history table to archive table in SQL DW. This activity is common to 600 pipelines and different activities copy different number of tables.
But, while loop executes the child SPs one by one.
I tried searching for a way to parallelize the 3 SPs but found nothing in SQL Server.
I want to trigger all the child SPs at once. Is there anyway I can do this? Any solution in SQL Server, Data Factory,Python Script or Spark will suffice.
You cannot execute 3 child stored procedures parallelly inside a parent stored procedure. But you can execute the 3 child procedures directly without requiring any parent procedure.
Please follow the demonstration below where I executed 3 stored procedures in parallel using azure data factory (For each activity):
I have 3 stored procedures sp1, sp2 and sp3 that I want to execute in parallel. I created a parameter (Array) that holds the names of these stored procedures.
This parameter acts as items value (#pipeline().parameters.sp_names) in for each activity. Here, in for each activity, do not check sequential checkbox and specify batch value as 3.
Now inside the for each activity, create a stored procedure activity and create the necessary linked service. While selecting the stored procedure name check the edit box. Give dynamic content for stored procedure name as #item()
This procedure helps to run the stored procedures parallelly. Look at the outputs when the same is executed with For each activity having sequential execution and batch execution.
With Sequential execution:
With Batch execution:

Snowflake orchestration of tasks

I have a batch load process that loads data into a staging database. I have a number of tasks that execute stored procedures which move the data to a different database. The tasks are executed when the SYSTEM$STREAM_HAS_DATA condition is satisfied on a given table.
I have a separate stored procedure that I want to execute only after the tasks have completed moving the data.
However, I have no way to know which tables will receive data and therefore do not know which tasks will be executed.
How can I know when all the tasks that satisfied the SYSTEM$STREAM_HAS_DATA condition are finished and I can now kick off the other stored procedure? Is there a way to orchestrate this step by step process similar to how you would in a SQL job?
There is no automated way but you can do it with some coding.
You may create a stored procedure to check the STATE column of the task_history view to see if the tasks are completed or skipped:
https://docs.snowflake.com/en/sql-reference/functions/task_history.html
You can call this stored procedure periodically using a task (like every 5 minutes etc).
Based on your checks inside of the stored procedure (all tasks were succeeded, the target SP wasn't executed today yet etc), you can execute your target stored procedure which needs to be executed after all tasks have been completed.
You can also check the status of all the streams via SELECT SYSTEM$STREAM_HAS_DATA('<stream_name>') FROM STREAM which does not process the stream, or SELECT COUNT(*) FROM STREAM.
Look into using IDENTIFIER for dynamic queries.

Why ,when and where we will use stored procedures, views and functions?

I often use a stored procedure for data access purpose but don't know which one is best - a view or a stored procedure or a function?
Please tell me which one of the above is best for data access purpose and why it is best, list down the reason with the example please.
I searched Google to learn which one is best but got no expected answer
View
A view is a “virtual” table consisting of a SELECT statement, by means of “virtual”
I mean no physical data has been stored by the view -- only the definition of the view is stored inside the database; unless you materialize the view by putting an index on it.
By definition you can not pass parameters to the view
NO DML operations (e.g. INSERT, UPDATE, and DELETE) are allowed inside the view; ONLY SELECT statements.
Most of the time, view encapsulates complex joins so it can be reusable in the queries or stored procedures. It can also provide level of isolation and security by hiding sensitive columns from the underlying tables.
Stored procedure
A stored procedure is a group of Transact-SQL statements compiled into a single execution plan or in other words saved collection of Transact-SQL statements.
A stored procedure:
accepts parameters
can NOT be used as building block in a larger query
can contain several statements, loops, IF ELSE, etc.
can perform modifications to one or several tables
can NOT be used as the target of an INSERT, UPDATE or DELETE statement
A view:
does NOT accept parameters
can be used as building block in a larger query
can contain only one single SELECT query
can NOT perform modifications to any table
but can (sometimes) be used as the target of an INSERT, UPDATE or DELETE statement.
Functions
Functions are subroutines made up of one or more Transact-SQL statements that can be used to encapsulate code for reuse
There are three types (scalar, table valued and inline mutlistatement) UDF and each of them server different purpose you can read more about functions or UDF in BOL
UDF has a big limitation; by definition it cannot change the state of the database. What I mean by this you cannot perform data manipulation operation inside UDF (INSERT, UPDATE , DELETE) etc.
SP are good for doing DDL statements that you can't do with functions. SP and user defined functions accept parameters and can returns values but they can't do the same statements.
User defined functions can only do DML statements.
View doesn't accept parameters, and only accept DML statements.
I hope below information will help you to understand the use of the SQL procedure, view, and function.
Stored Procedure - Stored Procedure can be used for any database operation like insert, update, delete and fetch which you mentioned that you are already using.
View - View only can be used to fetch the data but it has limitations as you can't pass the parameters to the view. e.g. filter the data based on the passed parameter
Function - Function usually used for a specific operation like you have many in-built SQL server functions also for the date, for math, for string manipulation etc.
I'll make it very short and straight.
When you are accessing data from different tables and don't want to pass parameter use View.
When you want to perform DML statement go for Function.
When you want to perform DDL statement go for Stored Procedure.
Rest is upon your knowledge and idea hit in your mind at particular point of time.
And for performance reasons many would argue
- avoid functions (especially scalar) if possible
It's easier to tweak stored procedures (query plans) and views
IMO, View (and Indexed View) are just Fancier SELECT
Stored Procedure are versatile as you can transform/manipulate within

How to run multiple stored procedures in parallel using ssis?

We have a list of stored procedures (more than 1000) in a table which need to be executed every morning.
The stored procedures do not have any dependency with each other.
We have tried while loop and cursor it used to takes a lot of time in execution.
We taught of creating job for each stored procedure and call them using sp_start_job (sp_start_job is called in async manner) we got level of parallelism.
Problem arise when a new stored procedure is added to list and it became huge.
some time ppl missed to create job related new stored procedure
DB got bombarded with a no of jobs (manageability issue for DBA)
Note: list of may altered any day (stored procedures can be added or removed from list).
If the SPs run for longer, I would have categorized the 1000 SPs into 5-10 numbers, then 1 SSIS package for each category and then Agent Jobs for each package. Then, schedule those jobs at same time.
There are many ways like Loops, Scripting and multiple factors to achieve it. You can test with different ways and go with the best one.
Note: Performance of the SSIS execution depends on your Memory, Processor and Hardware.
Adding to # Nick.MacDermaid - you can utilize MaxConcurrentExecutables property of package to implement custom parallelism. Of course you would need to have multiple containers and corresponding stored proc groups.
Parallel Execution in SSIS
MaxConcurrentExecutables, a property of the package. It defines how
many tasks (executables) can run simultaneously. It defaults to -1
which is translated to the number of processors plus 2. Please note
that if your box has hyperthreading turned on, it is the logical
processor rather than the physically present processor that is
counted.
Hi you can use the following piece of code to get basically write the script of running all your stored procedures if you add a new procedure it will automatically be added to the list
SELECT'EXEC '+SPECIFIC_NAME [Command] + ';'
FROM information_schema.routines
WHERE routine_type = 'PROCEDURE'
After this you take the result set put it into a tab delimited text file and save the file in a location.
use this link to import the text into a execute SQL task the first answer works well
SSIS: How do I pull a SQL statement from a file into a string variable?
execute the task and it should work, if you need to narrow the ;ist of procedures you can specify a specific prefix in the name of the procedure and use that in the where clause
It will run in serial, sorry i dont have enough rep to comment yet

How temporary tables behave in Stored Procedure of sybase

I have tried finding the answer for my issue but there were no appropriate answers found.
We have a Java web application where the data is loaded on the launch screen based on the user roles.
The data on launch screen is fetched by executing a stored procedure which in turn returns a ResultSet and then data is processed from it to show on launch screen.
My Queries:
If multiple people launch the Web Application simultaneously- will the stored procedure get executed multiple times? Will the execution be done parallely on single instance of Database Stored Procedure (or) for every request a new instance of stored procedure is created by Database. Basically I am curious to know what happens behind the scenes in this scenario.
Note: Inside this SybaseASE Stored Procedure we use a lot of temporary tables into which data is inserted and removed based on several conditions. And based on roles different users will get different results.
What is the scope of temporary table with in Stored Procedure and as mentioned in point 1 if multiple requests parallely access the Stored Procedure what will be the impact on temporary tables.
And based on point 2 is there a chance of database blockage or Deadlock situations occurring because of temporary tables with in a Stored Procedure?
1: two executions of the same stored proc are completely independent (there may be some commonalities in terms of the query plan, but that does not affect the results)
2: see 1. temp tables are specific to the stored proc invocation and the user's session; temp tables are dropped automatically at the end of the proc (if you didn't drop them already).
3: there cannot be a locking/blocking issues on the temp tables themselves. But there can of course always locking/blocking issues on other tables being queried (for example, to populate the temp tables). Nothing special here.

Resources