Snowflake task scheduling - snowflake-cloud-data-platform

I have to write a snowflake task to run everyday at every 2 minutes from 5:00 EST to 5:00 PM EST.
The code I wrote is not working, the task didnt stop running even after 5:00 PM:
CREATE OR REPLACE TASK tsk_master
WAREHOUSE = XS_WH
SCHEDULE = 'USING CRON * 5-17 * * * America/New_York'
TIMESTAMP_INPUT_FORMAT = 'YYYY-MM-DD HH24'
COMMENT = 'Master task job to trigger all other tasks'
AS call pntinsight_lnd.SP_ACCT_DIM_1();
Please suggest what did I do wrong, how to stop it from running after 5 PM, and how can i set it to run every 2 minutes?

You have to define all trigger minutes, it looks ugly but it should work:
CREATE OR REPLACE TASK tsk_master
WAREHOUSE = XS_WH
SCHEDULE = 'USING CRON 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58 5-17 * * * America/New_York'
TIMESTAMP_INPUT_FORMAT = 'YYYY-MM-DD HH24'
COMMENT = 'Master task job to trigger all other tasks'
AS call pntinsight_lnd.SP_ACCT_DIM_1();

snowflake task to run everyday at every 2 minutes from 5:00 EST to 5:00 PM EST.
Optional parameters:
/n
Indicates the nth instance of a given unit of time. Each quanta of time is computed independently
So every 2 minutes will be:
SCHEDULE = 'USING CRON */2 5-17 * * * America/New_York'

Related

weekly and monthly CRON Jobs

I have created a Snowflake two tasks like the two below, and they havent worked so far.
1// should run at the beginning of each week at 5 am CET time
Create or replace task WEEKLY_jOB
warehouse =compute_wh
schedule = 'USING CRON 0 5 * * 1 CET'
AS
INSERT INTO ... ;
ALTER TASK WEEKLY_jOB RESUME;
2// should run at the beginning of each month at 5 am CET time
Create or replace task MONTHLY_JOB
warehouse =compute_wh
schedule = 'USING CRON 0 5 1 * * CET'
AS
INSERT INTO ...;
ALTER TASK MONTHLY_JOB RESUME;

Snowflake Task does not start running

The task should run at 2:27am UTC, but it did not executed.
GRANT EXECUTE TASK ON ACCOUNT TO ROLE SYSADMIN;
CREATE or replace TASK TASK_DELETE3
WAREHOUSE = TEST
SCHEDULE = 'USING CRON 27 2 * * * UTC' as
CREATE OR REPLACE TABLE TEST2."PUBLIC"."DELETE"
CLONE TEST1."PUBLIC"."DELETE";
ALTER TASK TASK_DELETE3 RESUME;
The task [state] = started. Does anyone know why?
If the status shows that the task is started, that means it is enabled and will run in the scheduled times.
You can check the task history to see the previous runs and next run of the task using the following query:
select *
from table(information_schema.task_history(
task_name=>'TASK_DELETE3'));
I was using a different role when I was checking the table in the database. The Task successfully completed at scheduled time.

Snowflake task with CRON not scheduled

I have created a Snowflake task like the one below,
CREATE or replace TASK staging.task_name
WAREHOUSE = 'staging_warehouse'
SCHEDULE = 'USING CRON 0 1 * * * UTC'
AS
delete from staging....
but I dont the see task scheduled or executed looking at the task history
select *
from table(information_schema.task_history(
scheduled_time_range_start=>dateadd('hour',-10,current_timestamp()),
result_limit => 10,
task_name=>'task_name'));
I usually run task with minute based schedule and this is the first time using a cron schedule, what might I be missing here?
Can you enable the task and then run the query again?
alter task staging.task_name resume;

Is there a way to force run a Snowflake's TASK now (before the next scheduled slot)?

I have a task scheduled to run every 15 minutes:
CREATE OR REPLACE TASK mytask
WAREHOUSE = 'SHARED_WH_MEDIUM'
SCHEDULE = '15 MINUTE'
STATEMENT_TIMEOUT_IN_SECONDS = 3600,
QUERY_TAG = 'KLIPFOLIO'
AS
CREATE OR REPLACE TABLE mytable AS
SELECT * from xxx;
;
alter task mytask resume;
I see from the output of task_history() that the task is SCHEDULED:
select * from table(aftonbladet.information_schema.task_history(task_name => 'MYTASK')) order by scheduled_time;
QUERY_ID NAME DATABASE_NAME SCHEMA_NAME QUERY_TEXT CONDITION_TEXT STATE ERROR_CODE ERROR_MESSAGE SCHEDULED_TIME COMPLETED_TIME RETURN_VALUE
*** MYTASK *** *** *** SCHEDULED 2020-01-21 09:58:12.434 +0100
but I want it to run right now without waiting for the SCHEDULED_TIME , is there any way to accomplish that?
Snowflake now supports running tasks manually. Just use the EXECUTE TASK command:
EXECUTE TASK manually triggers an asynchronous single run of a scheduled task (either a standalone task or the root task in a task tree) independent of the schedule defined for the task. A successful run of a root task triggers a cascading run of child tasks in the tree as their precedent task completes, as though the root task had run on its defined schedule.
Also, there is no need for the task in started mode. Even tasks in suspended mode can be executed manually.
There is no way currently to execute a task manually. You could, however, alter the task schedule to 1 minute, let it run, and then alter it back to 15 minutes, so that you're not waiting the full 15 minutes. I have seen this request multiple times, and there is an Idea on Lodge (https://community.snowflake.com/s/ideas) that you should upvote (search for 'Tasks' and I think it'll be one of the top ideas). Since Tasks are still in Public Preview, it's likely that these types of ideas will be reviewed and prioritized if they have a lot of votes.
To build on Mike's answer:
You could have a task that executes every minute, but only if there's data on the stream!
For this you can create a table and stream just to decide if the task will be triggered every minute or not.
This root task should delete the data inserted in the stream to prevent the task running again.
So then you can have dependent tasks that execute every time you bring data into the stream, but only when the stream has new data.
This relies on the ability to run a task only when SYSTEM$STREAM_HAS_DATA()
-- stream so this task executes every minute, but only if there's new data
create table just_timestamps_stream_table(value varchar);
create stream just_timestamps_stream on table just_timestamps_stream_table;
-- https://docs.snowflake.com/en/user-guide/tasks-intro.html
create or replace task mytask_minute
warehouse = test_small
schedule = '1 MINUTE'
when SYSTEM$STREAM_HAS_DATA('just_timestamps_stream')
as
-- consume stream so tasks doesn't execute again
delete from just_timestamps_stream_table;
-- the real task to be executed
create or replace task mytask_minute_child1
warehouse = test_small
after mytask_minute
as
insert into just_timestamps values(current_timestamp, 'child1');
Full example:
https://github.com/fhoffa/snowflake_snippets/blob/main/stream_and_tasks/minimal.sql

Flink sql for state checkpoint

When I use flink sql api process data.
Restart app, sum result not save in checkpoint.It's still start with 1.
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
StateBackend stateBackend = new FsStateBackend("file:///D:/d_backup/github/flink-best-practice/checkpoint");
env.enableCheckpointing(1000 * 60);
env.setStateBackend(stateBackend);
Table table = tableEnv.sqlQuery(
"select sum(area_id) " +
"from rtc_warning_gmys " +
"where area_id = 1 " +
"group by character_id,area_id,group_id,platform");
// convert the Table into a retract DataStream of Row.
// A retract stream of type X is a DataStream<Tuple2<Boolean, X>>.
// The boolean field indicates the type of the change.
// True is INSERT, false is DELETE.
DataStream<Tuple2<Boolean, Row>> dsRow = tableEnv.toRetractStream(table, Row.class);
dsRow.map(new MapFunction<Tuple2<Boolean,Row>, Object>() {
#Override
public Object map(Tuple2<Boolean, Row> booleanRowTuple2) throws Exception {
if(booleanRowTuple2.f0) {
System.out.println(booleanRowTuple2.f1.toString());
return booleanRowTuple2.f1;
}
return null;
}
});
env.execute("Kafka table select");
Log as:
1
2
3
...
...
100
Restart app it still start:
1
2
3
...
I think sum value will be stored in checkpint file and restart app can read last result from checkpint like:
101
102
103
...
120
Some possibilities:
Did the job run long enough to complete a checkpoint? Just because the job produced output doesn't mean that a checkpoint was completed. I see you have checkpointing configured to occur once a minute, and the checkpoints could take some time to complete.
How was the job stopped? Unless they have been externalized, checkpoints are deleted when a job is cancelled.
How was the job restarted? Did it recover (automatically) from a checkpoint, or was it resumed from an externalized checkpoint or savepoint, or was it restarted from scratch?
This sort of experiment is easiest to do via the command line. You might, for example,
write an app that uses checkpoints, and has a restart strategy (e.g., env.setRestartStrategy(RestartStrategies.fixedDelayRestart(1000, 1000)))
start a local cluster
"flink run -d app.jar" to start the job
wait until at least one checkpoint has completed
"kill -9 task-manager-PID" to cause a failure
"taskmanager.sh start" to allow the job to resume from the checkpoint

Resources