How to schedule a cron job to run at app startup in my GAE application? I just want it to run one time at the app startup.
Cron jobs are used for tasks that should run independently of your client application. For example, you may need a cron job to update the totals in your database at the end of a day, or to periodically clean up stale session objects, etc. Typically, you specify the time when cron jobs have to run: e.g. "every midnight".
If you need to execute a task when your application loads, you can simply execute it from your application.
Related
We have a Flink Session Cluster that runs multiple jobs within it. I'm trying to see if there is a way, when a job restarts, we can keep the jobID to be sticky (not change on a job restart),or if we can link the new job to the old one somehow.
Say a job with ID X is running on a session cluster, it restarted for some reason (not the job or task manager, the job itself). The new job gets, I believe a jobID Y (or will it remain the same X?). Basically, I would like to know that they're the same jobs.
What options do I have in this regard?
I have an Apache Flink cluster (1.14.5) running on kubernetes (AKS). Cluster is running in session mode. I am triggering batch jobs to it periodically.
Earlier the job manager jvm metaspace was configured to 256mb then later increased to 512 MB. Whenever I run batch pipeline 3 to 4 times it completely fills up the metaspace and then it doesn't load any new batch job until I restart the job manager.
I do not see any class loader leaks in the application code. Also the triggered batch pipelines runs completely and marked in FINISHED state.
I have never seen the metaspace of job manager coming down from the time I restart cluster, it always keep growing up with every new batch pipeline run and eventually starts rejecting new batch jobs.
I need to know how flink manages/clean up this job manager metaspace periodically or it does not do it at all. Please suggest/help.
In the Dataflow FAQ, it is listed that running custom (cron) job processes on Compute Engine is a way to schedule dataflow pipelines. I am confused about how exactly that should be done: how to start the dataflow job on compute engine and start a cron job.
Thank you!
You can use the Google Cloud Scheduler to execute your Dataflow Job. On Cloud Scheduler you have targets, these could be HTTP/S endpoints, Pub/Sub topics, App Engine applications, you can use your Dataflow template as target. Review this external article to see an example: Schedule Your Dataflow Batch Jobs With Cloud Scheduler or if you want to add more services to the interacion: Scheduling Dataflow Pipeline using Cloud Run, PubSub and Cloud Scheduler.
I have this working on App Engine, but I imagine this is similar for Compute Engine
Cron will hit an endpoint on your service at the frequency you specify. So you need to setup a request handler for that endpoint that will launch the dataflow job when hit (essentially in your request handler you need to define your pipeline and then call 'run' on it).
That should be the basics of it. An extra step I do is I have the request handler for my cron job launch a cloud task and then I have the request handler for my cloud task launch the dataflow job. I do this because I've noticed the 'run' command for pipelines sometimes taking a while and cloud tasks have a 10 minute timeout, compared to the 30s timeout for cron jobs (or was it 60s).
We have 2 Jobs created in SQL Server Agent.
PreLoad
DWHLoad
Job step list in both Jobs have various steps.
DWHLoad needs to be run after successful completion of PreLoad Job.
As of now, I've scheduled PreLoad to run at 1:00AM and it finishes at 5:00AM.
DWHLoad to run at 6:00AM to avoid issues if PreLoad delays for any reason.
I could gather PreLoad steps into DWHLoad and run as one job to maintain dependency.
However, there are occasions where I need to run PreLoad separately and same is true with DWHLoad.
Is there a way to create dependency on Job and not on Job step?
i.e. Start DWHLoad only after successful completion of the PreLoad job?
Keep the 2 jobs you have and remove the schedule. This will allow you to right click and start the job for the times you want to run them manually. You mentioned that each job has multiple steps, so you will need to create a 3rd job with the combined steps from each job in the order you need. Add a schedule to the 3rd job and you will have the dependency you are wanting with a scheduled job.
I have many talend jobs, I am executing all those jobs in windows scheduler (task scheduler). I need to store the time of those jobs when they scheduled. how I can get the time of that? Please assist me on this.
Windows Scheduler already stores a list of executed scheduled tasks where you can see the execution of tasks in either the last hour, day, week or 30 days.
If you're looking for something more programmatic and accessible from outside of Windows Scheduler then you could output something in the Talend job itself.
For example you could start your job with a quick entry to a log file or database table with the date-time of execution and the job name (retrievable from Talend's time handling functions and also system variables - try hitting ctrl+space to see a list of variables available to you) that is started and then finish the job with another log in the log file or database table with an entry for whether it succeeded or failed and the time that the job finished along with the job name.