Executing Multiple Lines in a Snowflake Task - snowflake-cloud-data-platform

Executing Multiple Lines in a Snowflake Task - snowflake-cloud-data-platform

I created the task below and am having trouble getting it to execute all lines. It looks like it just does the first delete from productweekly_upload then completes. Anyone have any ideas? This is my first time using tasks
CREATE OR REPLACE TASK WeeklySymphony_Load
WAREHOUSE = UPLOADWAREHOUSE
SCHEDULE = 'USING CRON 10 8 * * MON America/New_York'
as
--run every monday at 8:10 am
delete from Productweekly_Upload;
delete from Factsweekly_Upload;
delete from Productweekly;
delete from Factsweekly;
copy into ProductWeekly_Upload
from #symphony_s3_stage/prasco_phast_it_prdct_wk_;
copy into FactsWeekly_Upload
from #symphony_s3_stage/prasco_phast_it_wk_;
insert into ProductWeekly
select * from ProductWeekly_Upload;
insert into FactsWeekly
select * from FactsWeekly_Upload;

You can only execute 1 command in a TASK. If you want to create multiple steps, you can either wrap these into a stored procedure and call the SP from the TASK, or you can create each step as a TASK and make those dependencies, so they execute in order.
I recommend a read-through of this document:
https://docs.snowflake.com/en/user-guide/tasks-intro.html

If you want multiple statement, the 3 solutions are :
Stored proc
Chain multiple tasking AFTER statement
Use Procedural logic with AS DECLARE // BEGIN // END; block
https://docs.snowflake.com/en/sql-reference/sql/create-task.html#procedural-logic-using-snowflake-scripting

Related

Snowflake Task does not start running

The task should run at 2:27am UTC, but it did not executed.
GRANT EXECUTE TASK ON ACCOUNT TO ROLE SYSADMIN;
CREATE or replace TASK TASK_DELETE3
WAREHOUSE = TEST
SCHEDULE = 'USING CRON 27 2 * * * UTC' as
CREATE OR REPLACE TABLE TEST2."PUBLIC"."DELETE"
CLONE TEST1."PUBLIC"."DELETE";
ALTER TASK TASK_DELETE3 RESUME;
The task [state] = started. Does anyone know why?

If the status shows that the task is started, that means it is enabled and will run in the scheduled times.
You can check the task history to see the previous runs and next run of the task using the following query:
select *
from table(information_schema.task_history(
task_name=>'TASK_DELETE3'));

I was using a different role when I was checking the table in the database. The Task successfully completed at scheduled time.

Is there a way to automatically unload data in Snowflake to local machine using tasks or stored procedure?

I would like to automatically unload my data using Task or Stored Procedure but they didn't work out, seems like the get command wasn't run, is there a way to fix it? Thank you,
CREATE OR REPLACE STREAM unload_dimads_stream
ON TABLE "FA_PROJECT01_DB"."ADSBI"."DIM_ADS";
CREATE OR REPLACE TASK unload_dimads_task1
WAREHOUSE = FA_PROJECT01_CLOUDDW
SCHEDULE = '1 minute'
WHEN SYSTEM$STREAM_HAS_DATA('unload_dimads_stream')
AS
COPY INTO #adsbi.%dim_ads from AdsBI.Dim_Ads file_format = (TYPE=CSV FIELD_DELIMITER = '|' BINARY_FORMAT = 'UTF-8' compression=none) header= true overwrite=true;
CREATE OR REPLACE TASK unload_dimads_task2
WAREHOUSE = FA_PROJECT01_CLOUDDW
AFTER unload_dimads_task1
WHEN SYSTEM$STREAM_HAS_DATA('unload_dimads_stream')
AS
GET #adsbi.%dim_ads file://D:\unload\table1;
ALTER TASK unload_dimads_task2 resume;
ALTER TASK unload_dimads_task1 resume;

GET #adsbi.%dim_ads file://D:\unload\table1;
You cannot run this step on a stored procedure or task. Stored procedures and tasks are running on Snowflake servers. You can script GET operations outside of Snowflake using the SnowSQL client, but something else will need to call it to run.

How to use copy Storage Integration in a Snowflake task statement?

I'm testing SnowFlake. To do this I created an instance of SnowFlake on GCP.
One of the tests is to try the daily load of data from a STORAGE INTEGRATION.
To do that I had generated the STORAGE INTEGRATION and the stage.
I tested the copy
copy into DEMO_DB.PUBLIC.DATA_BY_REGION from #sg_gcs_covid pattern='.*data_by_region.*'
and all goes fine.
Now it's time to test the daily scheduling with the task statement.
I created this task:
CREATE TASK schedule_regioni
WAREHOUSE = COMPUTE_WH
SCHEDULE = 'USING CRON 42 18 9 9 * Europe/Rome'
COMMENT = 'Test Schedule'
AS
copy into DEMO_DB.PUBLIC.DATA_BY_REGION from #sg_gcs_covid pattern='.*data_by_region.*';
And I enabled it:
alter task schedule_regioni resume;
I got no errors, but the task don't loads data.
To resolve the issue i had to put the copy in a stored procedure and insert the call of the storede procedure instead of the copy:
DROP TASK schedule_regioni;
CREATE TASK schedule_regioni
WAREHOUSE = COMPUTE_WH
SCHEDULE = 'USING CRON 42 18 9 9 * Europe/Rome'
COMMENT = 'Test Schedule'
AS
call sp_upload_c19_regioni();
The question is: this is a desired behavior or an issue (as I suppose)?
Someone can give to me some information about this?

I've just tried ( but with storage integration and stage on AWS S3) and it works fine also using copy command inside sql part of the task, without calling a stored procedure.
In order to start investigating the issue, I would check following info (maybe for debugging I would create the task scheduling it every few minutes):
check task_history and verify executions
select *
from table(information_schema.task_history(
scheduled_time_range_start=>dateadd('hour',-1,current_timestamp()),
result_limit => 100,
task_name=>'YOUR_TASK_NAME'));
if previous step is successfull, check copy_history and verify the input file name , target table and number of records/errors are the expected ones
SELECT *
FROM TABLE (information_schema.copy_history(TABLE_NAME => 'YOUR_TABLE_NAME',
start_time=> dateadd(hours, -1, current_timestamp())))
ORDER BY 3 DESC;
Check if the results are the same you get when the task with sp call is executed.
Please also confirm that you are loading new files not yet loaded into your table with COPY command (otherwise you need to specify FORCE = TRUE parameter in the copy command or remove metadata information truncating your target table to reload the same files).

Is there a way to force run a Snowflake's TASK now (before the next scheduled slot)?

I have a task scheduled to run every 15 minutes:
CREATE OR REPLACE TASK mytask
WAREHOUSE = 'SHARED_WH_MEDIUM'
SCHEDULE = '15 MINUTE'
STATEMENT_TIMEOUT_IN_SECONDS = 3600,
QUERY_TAG = 'KLIPFOLIO'
AS
CREATE OR REPLACE TABLE mytable AS
SELECT * from xxx;
;
alter task mytask resume;
I see from the output of task_history() that the task is SCHEDULED:
select * from table(aftonbladet.information_schema.task_history(task_name => 'MYTASK')) order by scheduled_time;
QUERY_ID NAME DATABASE_NAME SCHEMA_NAME QUERY_TEXT CONDITION_TEXT STATE ERROR_CODE ERROR_MESSAGE SCHEDULED_TIME COMPLETED_TIME RETURN_VALUE
*** MYTASK *** *** *** SCHEDULED 2020-01-21 09:58:12.434 +0100
but I want it to run right now without waiting for the SCHEDULED_TIME , is there any way to accomplish that?

Snowflake now supports running tasks manually. Just use the EXECUTE TASK command:
EXECUTE TASK manually triggers an asynchronous single run of a scheduled task (either a standalone task or the root task in a task tree) independent of the schedule defined for the task. A successful run of a root task triggers a cascading run of child tasks in the tree as their precedent task completes, as though the root task had run on its defined schedule.
Also, there is no need for the task in started mode. Even tasks in suspended mode can be executed manually.

There is no way currently to execute a task manually. You could, however, alter the task schedule to 1 minute, let it run, and then alter it back to 15 minutes, so that you're not waiting the full 15 minutes. I have seen this request multiple times, and there is an Idea on Lodge (https://community.snowflake.com/s/ideas) that you should upvote (search for 'Tasks' and I think it'll be one of the top ideas). Since Tasks are still in Public Preview, it's likely that these types of ideas will be reviewed and prioritized if they have a lot of votes.

To build on Mike's answer:
You could have a task that executes every minute, but only if there's data on the stream!
For this you can create a table and stream just to decide if the task will be triggered every minute or not.
This root task should delete the data inserted in the stream to prevent the task running again.
So then you can have dependent tasks that execute every time you bring data into the stream, but only when the stream has new data.
This relies on the ability to run a task only when SYSTEM$STREAM_HAS_DATA()
-- stream so this task executes every minute, but only if there's new data
create table just_timestamps_stream_table(value varchar);
create stream just_timestamps_stream on table just_timestamps_stream_table;
-- https://docs.snowflake.com/en/user-guide/tasks-intro.html
create or replace task mytask_minute
warehouse = test_small
schedule = '1 MINUTE'
when SYSTEM$STREAM_HAS_DATA('just_timestamps_stream')
as
-- consume stream so tasks doesn't execute again
delete from just_timestamps_stream_table;
-- the real task to be executed
create or replace task mytask_minute_child1
warehouse = test_small
after mytask_minute
as
insert into just_timestamps values(current_timestamp, 'child1');
Full example:
https://github.com/fhoffa/snowflake_snippets/blob/main/stream_and_tasks/minimal.sql

I am trying to run multiple query statements created when using the python connector with the same query id

I have created a Python function which creates multiple query statements.
Once it creates the SQL statement, it executes it (one at a time).
Is there anyway to way to bulk run all the statements at once (assuming I was able to create all the SQL statements and wanted to execute them once all the statements were generated)? I know there is an execute_stream in the Python Connector, but I think this requires a file to be created first. It also appears to me that it runs a single query statement at a time."
Since this question is missing an example of the file, here is a file content that I have provided as extra that we can work from.
//connection test file for python multiple queries
import snowflake.connector
conn = snowflake.connector.connect(
user = 'xxx',
password = '',
account = 'xxx',
warehouse= 'xxx',
database= 'TEST_xxx'
session_parameters = {
'QUERY_TAG: 'Rachel_test',
}
}
while(conn== true){
print(conn.sfqid)import snowflake.connector
try:
conn.cursor().execute("CREATE WAREHOUSE IF NOT EXISTS tiny_warehouse_mg")
conn.cursor().execute("CREATE DATABASE IF NOT EXISTS testdb_mg")
conn.cursor().execute("USE DATABASE testdb_mg")
conn.cursor().execute(
"CREATE OR REPLACE TABLE "
"test_table(col1 integer, col2 string)")
conn.cursor().execute(
"INSERT INTO test_table(col1, col2) VALUES " +
" (123, 'test string1'), " +
" (456, 'test string2')")
break
except Exception as e:
conn.rollback()
raise e
}
conn.close()
The reference to this question refers to a method that can be done with the file call, the example in documentation is as follows:
from codecs import open
with open(sqlfile, 'r', encoding='utf-8') as f:
for cur in con.execute_stream(f):
for ret in cur:
print(ret)
Reference to guide I used
Now when I ran these, they were not perfect, but in practice I was able to execute multiple sql statements in one connection, but not many at once. Each statement had their own query id. Is it possible to have a .sql file associated with one query id?

Is it possible to have a .sql file associated with one query id?
You can achieve that effect with the QUERY_TAG session parameter. Set the QUERY_TAG to the name of your .SQL file before executing it's queries. Access the .SQL file QUERY_IDs later using the QUERY_TAG field in QUERY_HISTORY().

I believe though you generated the .sql while executing in snowflake each statement will have unique query id.
If you want to run one sql independent to other you may try with multiprocessing/multi threading concept in python.

The Python and Node.Js libraries do not allow multiple statement executions.
I'm not sure about Python but for Node.JS there is this library that extends the original one and add a method call "ExecutionAll" to it:
snowflake-multisql

You just need to wrap multiple statements with the BEGIN and END.
BEGIN
<statement_1>;
<statement_2>;
END;
With these operators, I was able to execute multiple statement in nodejs

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Executing Multiple Lines in a Snowflake Task - snowflake-cloud-data-platform

If you want multiple statement, the 3 solutions are : Stored proc Chain multiple tasking AFTER statement Use Procedural logic with AS DECLARE // BEGIN // END; block https://docs.snowflake.com/en/sql-reference/sql/create-task.html#procedural-logic-using-snowflake-scripting

Related

Snowflake Task does not start running

Is there a way to automatically unload data in Snowflake to local machine using tasks or stored procedure?

How to use copy Storage Integration in a Snowflake task statement?

Is there a way to force run a Snowflake's TASK now (before the next scheduled slot)?

I am trying to run multiple query statements created when using the python connector with the same query id

Categories

Resources