We are using Control-M to submit several batch jobs to a legacy application, but due to limitations, the only way to monitor their status is by querying the Process table of the DB.
Process table:
JobNum
JobStat
Batch
1
Finished
ABC
2
Failed
ABC
3
Started
ABC
4
Started
ABC
I am trying use Cyclic Database job to query the Jobs, and rerun ever 5min while there are still jobs in Started, but have it break when:
The query result is empty (No jobs exist for that batch) - set to Not OK
All jobs are in either Finished or Failed in the batch - set to OK
Currently, I am trying to do something like:
SELECT 'TotalJobs', COUNT(JobNum)
FROM Process
WHERE Batch = 'ABC'
SELECT 'StartedJobs', COUNT(JobNum)
FROM Process
WHERE Batch = 'ABC'
AND JobStat = 'Started'
SELECT 'CompletedJobs', COUNT(JobNum)
FROM Process
WHERE Batch = 'ABC'
AND JobStat IN ('Finished', 'Failed')
Then using On-Do Actions with Specific statements like -
Statement: *
Code: TotalJob,0
Set-NotOK
Statement: *
Code: StartedJobs,0
Set-OK
But it does both actions...
Is this possible to do this more complex analysis with On-Do Actions?
Thanks
Looks like it should work, could you share a screenshot of the output (sysout) and the job log when it matches both scenarios (this will show what it has matched). Possibly the output is including more than intended.
Related
for example sql-client.sh embedded
insert into wap_fileused_daily(orgId, pdate, platform, platform_count) select u.orgId, u.pdate, coalesce(p.platform,'other'), sum(u.isMessage) as platform_count from users as u left join ua_map_platform as p on u.uaType = p.uatype where u.isMessage = 1 group by u.orgId, u.pdate, p.platform
it will show up as:enter image description here
there will never be any checkpoint.
Question: 1) how to trigger checkpoint ( alert job)
2) how to recover in case of failure
You can specify execution configuration parameters in the SQL Client YAML file. For example, the following should work:
configuration:
execution.checkpointing.interval: 42
There is a feature request on flink:https://cwiki.apache.org/confluence/display/FLINK/FLIP-147%3A+Support+Checkpoints+After+Tasks+Finished
I'm testing SnowFlake. To do this I created an instance of SnowFlake on GCP.
One of the tests is to try the daily load of data from a STORAGE INTEGRATION.
To do that I had generated the STORAGE INTEGRATION and the stage.
I tested the copy
copy into DEMO_DB.PUBLIC.DATA_BY_REGION from #sg_gcs_covid pattern='.*data_by_region.*'
and all goes fine.
Now it's time to test the daily scheduling with the task statement.
I created this task:
CREATE TASK schedule_regioni
WAREHOUSE = COMPUTE_WH
SCHEDULE = 'USING CRON 42 18 9 9 * Europe/Rome'
COMMENT = 'Test Schedule'
AS
copy into DEMO_DB.PUBLIC.DATA_BY_REGION from #sg_gcs_covid pattern='.*data_by_region.*';
And I enabled it:
alter task schedule_regioni resume;
I got no errors, but the task don't loads data.
To resolve the issue i had to put the copy in a stored procedure and insert the call of the storede procedure instead of the copy:
DROP TASK schedule_regioni;
CREATE TASK schedule_regioni
WAREHOUSE = COMPUTE_WH
SCHEDULE = 'USING CRON 42 18 9 9 * Europe/Rome'
COMMENT = 'Test Schedule'
AS
call sp_upload_c19_regioni();
The question is: this is a desired behavior or an issue (as I suppose)?
Someone can give to me some information about this?
I've just tried ( but with storage integration and stage on AWS S3) and it works fine also using copy command inside sql part of the task, without calling a stored procedure.
In order to start investigating the issue, I would check following info (maybe for debugging I would create the task scheduling it every few minutes):
check task_history and verify executions
select *
from table(information_schema.task_history(
scheduled_time_range_start=>dateadd('hour',-1,current_timestamp()),
result_limit => 100,
task_name=>'YOUR_TASK_NAME'));
if previous step is successfull, check copy_history and verify the input file name , target table and number of records/errors are the expected ones
SELECT *
FROM TABLE (information_schema.copy_history(TABLE_NAME => 'YOUR_TABLE_NAME',
start_time=> dateadd(hours, -1, current_timestamp())))
ORDER BY 3 DESC;
Check if the results are the same you get when the task with sp call is executed.
Please also confirm that you are loading new files not yet loaded into your table with COPY command (otherwise you need to specify FORCE = TRUE parameter in the copy command or remove metadata information truncating your target table to reload the same files).
I am new to Snowflake and am trying to create my first task.
CREATE TASK task_update_table
WAREHOUSE = "TEST"
SCHEDULE = 'USING CRON 0 5 * * * America/Los_Angeles'
AS
INSERT INTO "TEST"."WEB"."SOME_TABLE" (ID,VALUE1,VALUE2,VALUE3)
WITH CTE AS
(SELECT
ID
,VALUE1
,VALUE2
,VALUE3
FROM OTHER_TABLE
WHERE ID NOT IN (SELECT ID FROM "TEST"."WEB"."SOME_TABLE")
)
SELECT
ID,VALUE1,VALUE2,VALUE3
FROM CTE
I got a message that the task was created successfully
"Task task_update_table successfully created"
I then try to run show tasks in schema SHOW TASKS IN "TEST"."WEB" and get 0 rows as a result. What am I doing wrong? why is the task not showing?
I did all of this under sysadmin and was using the same warehouse, db and schema.
There are some limitations around show commands that might be blocking you,
particularly "SHOW commands only return objects for which the current user’s current role has been granted the necessary access privileges".
https://docs.snowflake.com/en/sql-reference/sql/show.html#general-usage-notes
I suspect the task was created by a different role (therefore owned by a different role), or perhaps it was created in different database or schema.
To find it, I'd recommend running the following using a role such as ACCOUNTADMIN.
show tasks in account;
SELECT *
FROM (
SELECT *
FROM TABLE(RESULT_SCAN(LAST_QUERY_ID())))
WHERE "name" = 'TASK_UPDATE_TABLE';
While testing and learning in Snowflake, it is critical you set your session "context" correctly, using commands like this:
USE ROLE my_role_here;
USE WAREHOUSE my_warehouse_here;
USE DATABASE my_database_here;
USE SCHEMA my_schema_here;
Doing those four commands, or setting defaults for them for your user will help you tremendously when learning.
I hope this helps...Rich
I have a task scheduled to run every 15 minutes:
CREATE OR REPLACE TASK mytask
WAREHOUSE = 'SHARED_WH_MEDIUM'
SCHEDULE = '15 MINUTE'
STATEMENT_TIMEOUT_IN_SECONDS = 3600,
QUERY_TAG = 'KLIPFOLIO'
AS
CREATE OR REPLACE TABLE mytable AS
SELECT * from xxx;
;
alter task mytask resume;
I see from the output of task_history() that the task is SCHEDULED:
select * from table(aftonbladet.information_schema.task_history(task_name => 'MYTASK')) order by scheduled_time;
QUERY_ID NAME DATABASE_NAME SCHEMA_NAME QUERY_TEXT CONDITION_TEXT STATE ERROR_CODE ERROR_MESSAGE SCHEDULED_TIME COMPLETED_TIME RETURN_VALUE
*** MYTASK *** *** *** SCHEDULED 2020-01-21 09:58:12.434 +0100
but I want it to run right now without waiting for the SCHEDULED_TIME , is there any way to accomplish that?
Snowflake now supports running tasks manually. Just use the EXECUTE TASK command:
EXECUTE TASK manually triggers an asynchronous single run of a scheduled task (either a standalone task or the root task in a task tree) independent of the schedule defined for the task. A successful run of a root task triggers a cascading run of child tasks in the tree as their precedent task completes, as though the root task had run on its defined schedule.
Also, there is no need for the task in started mode. Even tasks in suspended mode can be executed manually.
There is no way currently to execute a task manually. You could, however, alter the task schedule to 1 minute, let it run, and then alter it back to 15 minutes, so that you're not waiting the full 15 minutes. I have seen this request multiple times, and there is an Idea on Lodge (https://community.snowflake.com/s/ideas) that you should upvote (search for 'Tasks' and I think it'll be one of the top ideas). Since Tasks are still in Public Preview, it's likely that these types of ideas will be reviewed and prioritized if they have a lot of votes.
To build on Mike's answer:
You could have a task that executes every minute, but only if there's data on the stream!
For this you can create a table and stream just to decide if the task will be triggered every minute or not.
This root task should delete the data inserted in the stream to prevent the task running again.
So then you can have dependent tasks that execute every time you bring data into the stream, but only when the stream has new data.
This relies on the ability to run a task only when SYSTEM$STREAM_HAS_DATA()
-- stream so this task executes every minute, but only if there's new data
create table just_timestamps_stream_table(value varchar);
create stream just_timestamps_stream on table just_timestamps_stream_table;
-- https://docs.snowflake.com/en/user-guide/tasks-intro.html
create or replace task mytask_minute
warehouse = test_small
schedule = '1 MINUTE'
when SYSTEM$STREAM_HAS_DATA('just_timestamps_stream')
as
-- consume stream so tasks doesn't execute again
delete from just_timestamps_stream_table;
-- the real task to be executed
create or replace task mytask_minute_child1
warehouse = test_small
after mytask_minute
as
insert into just_timestamps values(current_timestamp, 'child1');
Full example:
https://github.com/fhoffa/snowflake_snippets/blob/main/stream_and_tasks/minimal.sql
We have different SSIS package that we use in daily tasks (updates, ETL...) and we have a kind of complicated structure, where a package calls different other packages. And there are primarily about 10 principal jobs that call secondary ones. So these 10 jobs are always on success even if a step fails so it wouldn't block other executions. Although we would like to retrieve the steps (and their status) that are related to these jobs via a SQL Query but we couldn't join between the steps and their calling jobs and at the same time retrieve the status (The step status in this case and not the jobs).
I searched a lot on the net and i always find a script that joins the steps and calling jobs without the status or steps and status without knowing which job is calling...
(for example this link and this one )
so to sum it all up, we are trying to do a Query where we can join the jobs, their Status and their parent job.
Any help in this matter would be really appreciated and thanks in advance.
EDIT
Thanks to the link in #BaconBits comment i was able to create a query joining three tables (msdb.dbo.sysjobsteps, msdb.dbo.sysjobs, msdb.dbo.sysjobhistory) that retrieves something like the following:
Job_name1 Step_name1 Job1_status
Job_name1 Step_name2 Job1_status
Job_name1 Step_name3 Job1_status
Job_name2 Step_name1 Job2_status
Job_name2 Step_name2 Job2_status
But I still couldn't retrieve the step status (which is what i need in this case since the job outcome is always on success even if a step fails)
Query:
select j.name, s.step_name,
CASE WHEN s.last_run_outcome=0 THEN 'Failed'
WHEN s.last_run_outcome=1 THEN 'Success'
WHEN s.last_run_outcome=2 THEN 'Retry'
WHEN s.last_run_outcome=3 THEN 'Canceled'
END
,h.run_date, s.output_file_name
from msdb.dbo.sysjobsteps s
inner join msdb.dbo.sysjobs j on s.job_id=j.job_id
inner join msdb.dbo.sysjobhistory h
on h.job_id=j.job_id or s.step_id=h.step_id
--where j.name like '%Dem%'
order by h.run_date, j.name
Thank you #BaconBits and anyone for any further help.